Commit 711131f49556057027c989e1be780895b07b0423

Lea Verou 2012-07-29T00:57:42

Added FAQ about regex-powered highlighters

diff --git a/faq.html b/faq.html
index 47de6f6..f80090f 100644
--- a/faq.html
+++ b/faq.html
@@ -22,36 +22,59 @@
 	<div class="intro" data-src="templates/header-main.html" data-type="text/html"></div>
 	
 	<h2>FAQ</h2>
-	<p>Frequently Asked Questions, or Questions I want people to Frequently Ask.</p>
+	<p>Frequently Asked Questions, with a few Questions I want people to Frequently Ask.</p>
 </header>
 
 <section>
+	<h1>Isn’t it bad to do syntax highlighting with regular expressions?</h1>
+	
+	<p>It is true that to correctly handle every possible case of syntax found in the wild, one would need to write a full-blown parser. 
+	However, in most web applications and websites a small error margin is usually acceptable and a rare highlighting failure is not the end of the world.
+	A syntax highlighter based on regular expressions might only be accurate 99% of the time (the actual percentage is just a guess),
+	but in exchange for the small error margin, it offers some very important benefits:
+	<ul>
+		<li>Smaller filesize. Proper parsers are very big.</li>
+		<li>Extensibility. Authors can define new languages simply by knowing how to code regular expressions. 
+			Writing a correct, unambiguous BNF grammar is a task at least an order of magnitude harder.</li>
+		<li>Graceful error recovery. Parsers fail on incorrect syntax, where regular expressions keep matching.</li>
+	</ul>
+	
+	<p>For this reason, most syntax highlighters on the web and on desktop, are powered by regular expressions. This includes the internal syntax
+	highlighters used by popular native applications like Espresso and Sublime Text, at this time of writing.</p>
+</section>
+
+<section>
 	<h1>Why is asynchronous highlighting disabled by default?</h1>
+	
 	<p>Web Workers are good for preventing syntax highlighting of really large code blocks from blocking the main UI thread.
 	In most cases, you will want to highlight reasonably sized chunks of code, and this will not be needed.
-	Furthermore, using Web Workers is actually <strong>slower</strong> than synchronously highlighting, it just appears faster because it doesn’t block the main thread.
-	Also, Web Workers cannot interact with the DOM and most other APIs, so they are notoriously hard to debug.</p>
+	Furthermore, using Web Workers is actually <strong>slower</strong> than synchronously highlighting, it just appears faster 
+	in some cases because it doesn’t block the main thread.
+	Also, Web Workers cannot interact with the DOM and most other APIs (e.g. the console), so they are notoriously hard to debug.</p>
 </section>
 
 <section>
 	<h1>Why is pre-existing HTML stripped off?</h1>
+	
 	<p>Because it would complicate the code a lot, although it’s not a crucial feature for most people. 
 	If it’s very important to you, there are sufficient hooks to allow you to <a href="extending.html#writing-plugins">write a plugin</a> that retains it.</p>
 </section>
 
 <section>
 	<h1>If pre-existing HTML is stripped off, how can I highlight certain parts of the code?</h1>
+	
 	<p>There is a number of ways around it. You can always break the block of code into multiple parts, and wrap the HTML around it (or just use a <code>.highlight</code> class).
-	You can see an example of this in the “<a href="#basic-usage">Basic usage</a>” section of the homepage.</p>
+	You can see an example of this in action at the “<a href="#basic-usage">Basic usage</a>” section of the homepage.</p>
 	<p>Another way around the limitation is to use the <a href="plugins/line-highlight/">Line Highlght plugin</a>, to highlight and link to specific lines and/or line ranges.
 </section>
 
 <section>
 	<h1>How do I know which tokens I can style for every language?</h1>
+	
 	<p>Every token that is highlighted gets two classes: <code>token</code> and a class with the token type (e.g. <code>comment</code>).
 	You can find the different types of tokens either by looking at the keys of the object defining the language or by running this snippet in the console:
 	<pre><code class="language-javascript">function printTokens(o, prefix) { for (var i in o) { console.log((prefix? prefix + ' > ' : '') + i); if (o[i].inside) printTokens(o[i].inside, (prefix? prefix + ' > ' : '') + i); } };</code></pre>
-	<p>Then you can use the function for every language you want to print its tokens. For example, markup:</p>
+	<p>Then you can use the function for every language you want to examine. For example, markup:</p>
 	<pre><code class="language-javascript">printTokens(Prism.languages.markup);</code></pre>
 	<p>which outputs:</p>
 	<pre>comment
@@ -97,15 +120,17 @@ entity</pre>
 	<p>Just use a descendant selector, that includes the language class. The default <code>prism.css</code> does this, to have different colors for 
 	JavaScript strings (which are very common) and CSS strings (which are relatively rare). Here’s that code, simplified to illustrate the technique:
 	<pre><code class="language-css">
-.language-javascript .token.string {
+.token.string {
 	color: #690;
 }
 
-.language-css .token.string {
+.language-css .token.string,
+.style .token.string {
 	color: #a67f59;
 }</code></pre>
+
 	<p>Abbreviated language classes (e.g. <code>lang-css</code>) will be converted to their extended forms, so you don’t need to account for them.</p>
-	<p>The same technique is used to differentiate XML tag namespaces from attribute namespaces:</p>
+	<p>The same technique can be used to differentiate XML tag namespaces from attribute namespaces:</p>
 	<pre><code class="language-css">.tag > .token.namespace {
 	color: #b37298;
 }
@@ -120,11 +145,14 @@ entity</pre>
 <script src="utopia.js"></script>
 <script>
 	$$('section > h1').forEach(function (h1) {
-		$u.element.create('a', {
-			properties: {
-				href: '#toc'
+		$u.element.create('p', {
+			contents: {
+				tag: 'a',
+				properties: {
+					href: '#toc'
+				},
+				contents: '↑ Back to top'
 			},
-			contents: '↑ Back to top',
 			inside: h1.parentNode
 		});
 	});