Posts Tagged "RegEx"

Better to be lazy or greedy?

Joe Strout —

Last week, we gave a sneak peek of BOSS, a new approach to string searching. We mentioned a "bit of magic" with regard to the repetition modifiers, * (0 or more) and + (one or more): these would do a lazy match, except at the end of the search pattern, in which case they would be greedy.

We expect this to be the most controversial feature of the whole BOSS design, so it's worth some time to explain all the considerations behind it, why we made this decision, and what the heck "lazy" and "greedy" mean when it comes to string searching anyway.

read full post

BOSS: Sneak Peek

Joe Strout —

In the last couple of blog posts, we first reviewed some of the shortcomings of regular expressions (RegEx). We then took a look at parsing expression grammars (PEGs), which are a new formalism that has a lot of advantages for defining (and more importantly, parsing) computer languages. But while they're great for that, using them directly for string searching is a bit of a square-peg-round-hole situation.

So, we at Luminary Apps have begun work on a string matching library that combines the best features of PEG and RegEx. This blog post is the first public discussion of that library. It's called BOSS, and I think you're going to love it.

read full post

Better Text Searching with PEG

Joe Strout —

In our last entry, I bemoaned the shortcomings of regular expressions for complex tasks. (This was after spending a day wrestling with a three-page-long RegEx pattern for finding functions in a C# TextWrangler language module.) I sketched out what I thought an ideal string-matching system would look like.

Well, that was three weeks ago. I've had time to do some more serious research, and it turns out that there is some modern work that is very relevant. It almost fits exactly what we were looking for — but not quite. It's a new construct called Parsing Expression Grammar.

read full post

Time for an alternative to RegEx?

Joe Strout —

RegEx is handy.  I use it all the time.  For simple tasks, it's quite pleasant to use.  For intermediate-sized tasks, it's acceptable.  But for complex tasks, it is a nightmare to write, read, and maintain.

So, I'd like to suggest that it's time to design an alternative -- something that works just as well on complex tasks as it does for simple ones, and stays readable and maintainable.  I agree with not reinventing the wheel... except when our current wheel is square and lumpy.

read full post
 

All blog posts