|
With a simple regexp, the split command will consume the character you're splitting on. In some cases, this is desirable, like parsing a simple comma-delimited string:
In parsing my DSL, I'm finding cases where I need to not consume the piece I'm splitting on. For example:
That's great, I got the foo and bar, but I've lost the operator. Obviously, in this case, I know what the operator is, I just split on it, but I'd really like to have the = remain in the resulting array. A simple tweak on the regexp will get it:
Putting the before and after space into the regexp will even get me a bit of trimage:
In another case, I want to split up paragraphs, splitting each time a new line starts without any indentation. For example, with this input:
I want this output:
Not having this newfound split awareness, I started dealing with scan and a regexp like so:
Well ... I got closer than that regexp actually gets me, I can't remember it now, but it wasn't working well. I wanted to use split, but I knew I'd lose the thing I was splitting on:
Gives me:
...and using the parens for grouping doesn't give me what I need either:
But in the process of re-reading some more advanced stuffs on regexp, I re-learned about 'zero-width positive lookahead'. Using it, my split works perfectly and the regexp is nice and tidy:
|
|
| | Email | Reload ? || Find | Recent | Home | last update: Mon Jul 31 2006 04:27 AM |