Posts Tagged "RegEx"
Joe Strout — Thu, Sep 08 2011
Last week, we gave a sneak peek of BOSS, a new approach to string searching. We mentioned a "bit of magic" with regard to the repetition modifiers, * (0 or more) and + (one or more): these would do a lazy match, except at the end of the search pattern, in which case they would be greedy.
We expect this to be the most controversial feature of the whole BOSS design, so it's worth some time to explain all the considerations behind it, why we made this decision, and what the heck "lazy" and "greedy" mean when it comes to string searching anyway.
Joe Strout — Fri, Sep 02 2011
In the last couple of blog posts, we first reviewed some of the shortcomings of regular expressions (RegEx). We then took a look at parsing expression grammars (PEGs), which are a new formalism that has a lot of advantages for defining (and more importantly, parsing) computer languages. But while they're great for that, using them directly for string searching is a bit of a square-peg-round-hole situation.
So, we at Luminary Apps have begun work on a string matching library that combines the best features of PEG and RegEx. This blog post is the first public discussion of that library. It's called BOSS, and I think you're going to love it.
Joe Strout — Fri, Aug 19 2011
In our last entry, I bemoaned the shortcomings of regular expressions for complex tasks. (This was after spending a day wrestling with a three-page-long RegEx pattern for finding functions in a C# TextWrangler language module.) I sketched out what I thought an ideal string-matching system would look like.
Well, that was three weeks ago. I've had time to do some more serious research, and it turns out that there is some modern work that is very relevant. It almost fits exactly what we were looking for — but not quite. It's a new construct called Parsing Expression Grammar.
Joe Strout — Fri, Jul 29 2011
RegEx is handy. I use it all the time. For simple tasks, it's quite pleasant to use. For intermediate-sized tasks, it's acceptable. But for complex tasks, it is a nightmare to write, read, and maintain.
So, I'd like to suggest that it's time to design an alternative -- something that works just as well on complex tasks as it does for simple ones, and stays readable and maintainable. I agree with not reinventing the wheel... except when our current wheel is square and lumpy.
All blog posts