<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Eureka Man &#187; Regular Expressions</title>
	<atom:link href="http://eurekaman.com/category/regular-expressions/feed" rel="self" type="application/rss+xml" />
	<link>http://eurekaman.com</link>
	<description>Pure Gold</description>
	<lastBuildDate>Mon, 28 Jun 2010 05:38:49 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Where&#8217;s my negation?</title>
		<link>http://eurekaman.com/wheres-my-negation</link>
		<comments>http://eurekaman.com/wheres-my-negation#comments</comments>
		<pubDate>Sun, 26 Nov 2006 05:35:48 +0000</pubDate>
		<dc:creator>Eureka Man</dc:creator>
				<category><![CDATA[Coding]]></category>
		<category><![CDATA[Questions]]></category>
		<category><![CDATA[Regular Expressions]]></category>

		<guid isPermaLink="false">http://eurekaman.com/wheres-my-negation</guid>
		<description><![CDATA[Anybody interested in a programming puzzle? Des? Warning: This post is full of seemingly incomprehensible strings of symbols that programmers call regular expressions, which actually do something very useful.  If that doesn&#8217;t turn you on, better to quit now.  If you&#8217;re still here, consider this&#8230;
Have you ever wondered why programming languages that have [...]]]></description>
			<content:encoded><![CDATA[<p>Anybody interested in a programming puzzle? <a href="http://www.destraynor.com/serendipity/index.php?/archives/104-Programming-Puzzle-2-Steve-Returns.html">Des</a>? Warning: This post is full of seemingly incomprehensible strings of symbols that programmers call regular expressions, which actually do something very useful.  If that doesn&#8217;t turn you on, better to quit now.  If you&#8217;re still here, consider this&#8230;</p>
<p>Have you ever wondered why programming languages that have good regular expression support (Perl, Javascript, Ruby, etc.) nevertheless omit syntax for negation or conjunction within regexes?   I&#8217;m sure there must be a good reason for this.  Has it been found that people just don&#8217;t intuitively get what negation does to a regex?  Are language  designers unwilling to complicate their syntax?  Are they avoiding the processing it takes to complement or intersect finite state machines?  They do compile to finite state machines, right?</p>
<p>Let me give you an example of where negation would be useful.  Say I&#8217;m scanning through some text for some sub-sections which have been explicitly quoted.  If the delimiting characters are angle brackets then I can write <code>/<([^>]*)>/</code>.  The bit in the middle <code>[^>]</code> matches any character which is not a &#8216;<code>></code>&#8216;.  This is a limited form of negation on a single character.  Taken as a whole the expression says: Find me the strings of non-&#8217;<code>></code>&#8216; characters which are surrounded by a &#8216;<code><</code>' on the left and a '<code>></code>' on the right.  So far so good.</p>
<p>But what if the sub-sections I'm interested in are bounded by multi-character strings? '<code><<<</code>' and '<code>>>></code>' for argument's sake.  In analogy to the example above I want to be able to write <code>/<<<(?^.*>>>.*)>>>/</code>.  I've invented some syntax for negation here: (?^R) means match anything that the regex R doesn't match.  The R in my example (<code>.*>>>.*</code>) matches any string that contains the sequence '<code>>>></code>' so the expression as a whole says: Find me the strings that do not contain '<code>>>></code>' that are bordered by '<code><<<</code>' on the left and '<code>>>></code>' on the right.   Now, in this example, the same can be accomplished with a "lazy star" (if your programming language supports that) which just makes the regex non-greedy within the bounding strings: <code>/<<<(.*?)>>>/</code>.  But I would much rather state the patterns I'm looking for declaratively than change the ambiguity-resolution mechanism of the processor.  Why can't I?</p>
<p>Any <a href="http://www.xrce.xerox.com/competencies/content-analysis/fsCompiler/fssyntax.html#tilde">finite-state toolkit</a> with regex support has syntax for negation and conjunction.  Do you know of any programming languages that do?  If not, why not?</p>
]]></content:encoded>
			<wfw:commentRss>http://eurekaman.com/wheres-my-negation/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
