String Interpolation, the "Hello $name_of_planet!"
style of generating strings, familiar to Perl, PHP, and Ruby
programmers, provides a simple and intuitive way of specifying content
in many languages from HTML to SQL to URLS.
It also makes it very easy to introduce serious security problems. SQL Injection, Script Injection, XML External Entity Injection (XXE), and Cross Site Scripting (XSS) attacks are results of a failure to properly escape user-supplied data before splicing it into a web-page, a SQL query, or a shell command.
These problems are not inherent to String Interpolation though. The strings themselves contain more than enough information to figure out how to safely incorporate user data. Consider the code snippet:
<html>
<head>
<title>Hello $name_of_planet!</title>
</head>
<body>
<h1>We come in peace.</h1>
<p>Take us to your
<abbr title="May (s)he live for many $time_units!">
$title_of_potentate
</abbr>.
</p>
</body>
</html>
It should be clear that the programmer intended the substitutions
$name_of_planet and $title_of_potentate to
be text in an html page, and that $time_units is part of
an HTML attribute. Obviously, the browser can handle all those
distinctions when the time comes to render a page, but at the time the
string interpolation is executed, it has no way to tell that one
string is HTML, and another is not. The problem is one of early
binding — acting before enough information is available to make
good decisions.
Below I discuss the benefits of string interpolation, outline some of the alternatives to string interpolation, show how interpolation can benefit from late binding, and explain how it can be bolted onto languages like Javascript. I use Javascript throughout the examples since Javascript examples can be run in browser, but the security problems discussed herein occur in code written in many languages.
Mitre.org's "Unforgivable Vulnerabilities" lists XSS, and SQL injection vulnerabilities as the 2nd and 3rd most frequent security vulnerabilities in applications they audited. Since web applications aren't vulnerable to the number 1 vulnerability, buffer overflows, XSS problems seem the most important security hole to address. Google Gears and the HTML 5 Spec includes in-browser database APIs, so I believe SQL Injection will become increasingly common in web applications.
XSS, Shell/SQL/Script/External-Entity Injection, etc. are all serious security problems, but they all work because of essentially the same problem — programs that compose portions of another language by concatenating strings can be easily fooled. Below is an example of a SQL Injection vulnerability:
query = "SELECT x, y, z FROM Table WHERE id='" + id + "'"
If a malicious user can cause id to be "';
DELETE FROM Table WHERE '' = '", then you'd better hope you
have good backups. And there are many ways id could be
manipulated — the user can enter strange inputs in an online
form, manipulate CGI parameters, send crafted inputs to an SMS robot,
or exploit any of the myriad of ways that information gets into a
computer system. The results can be pretty catastrophic —
they can be used to steal information from databases, insert false
records, steal accounts and money, and ruin peoples' and companies'
reputations.
Languages tend to distinguish a string from a core piece of the
program by using escaping conventions. In the SQL example
above, id could be rendered safe by converting 's to
\'s, and any \ to \\. And the
programmer could use a function to do that conversion, but that
approach is error prone, and requires the programmer to know a lot
about the language. For HTML and XML, the programmer has to
consistently correctly choose between at least 4 different escaping
schemes:
<[!CDATA[…]]> sections
<title> and
<textarea> elements
<script>, <style>,
<xmp> elements
And frequently, there is more than one type of escaping. If composing a URL
the programmer has to URI-encode the parameters and page, and then encode the whole URL as an HTML attribute, so that<a href="$page?$a=$b&$c=$d">
$c will
be interpreted correctly even if it happens to be gt.
So when composing the usual soup of HTML, CSS, Javascript, and URLs you have to contend with a viper's nest of subtly different escaping schemes which have to be applied in the correct order. And these escaping schemes are not simple. Libraries can provide the escaping functions but they cannot help the programmer decide which escaping convention is appropriate when.
Any API that requires the developer to know as much as the implementor does not solve any problems. Since choosing the appropriate escaping function requires the developer to be a language expert, simply providing libraries of escaping functions will not address injection as a class of vulnerabilities.
Some library writers write template libraries to provide a safe alternative. The SQL example above could be rewritten safely in JDBC as
db.execute("SELECT x, y, z FROM Table WHERE id='?'", id)
This solves the early-binding problem, by forcing binding to occur where control leaves the programmers code, and enters the SQL library's code.
But thus far, none of these library-specific schemes have completely displaced string interpolation, even in their specific domain, and they tend to suffer from a few common flaws:
First, by forcing binding to occur in a specific place, they break functional decomposition. Functional decomposition is, simply put, a way of breaking large problems into smaller ones by decomposing them into functions. So in the SQL example, I might decompose the problem into two subproblems : one to specify the table and columns, and one to specify which rows I need:
sub makeQuery() {
my $columnsToFetch = chooseColumns();
my $rowsToFetch = chooseRows();
return "SELECT $columnsToFetch WHERE $rowsToFetch";
}
sub $columnsToFetch() { return "x, y, z FROM Table"; }
sub $rowsToFetch() { return "id='$id'";
Functional decomposition only works if a piece of a program can be replaced by a function call that does the same thing. APIs that break functional decomposition suck, and will not scale to large applications.
Second, library specific solutions are library specific. A
developer who learns JDBC's ? syntax cannot transfer
that knowledge to deal with translated message strings that use
%1 style placeholders.
Third, library specific tend to allow no exceptions to the rule, and so force programmers to work around the system. Never, ever, force your developers to work around your security checks because your system does not allow them to do what they need to do. JDBC, for example, does not handle the following common situation well
"SELECT x, y, z FROM Table WHERE id IN ($set_of_ids)"
If a programmer wants to specify multiple ids, they are out of luck. The system is incomplete, and so the programmer falls back on – string interpolation or outright hackery.
Finally, library specific solutions often use positional parameters. Positional parameters don't scale to large or complicated templates. The following example might have started off nice and small, but as code has been added over the years (where 1 year ~ 1 caffeine-fueled weekend before launch) it has grown unmaintainable
print 'I met a ? in the ? who had a ? full of ?. He said'
' "Hello, ?, How are you today\?" "Fine," I replied'
' "but I can't seem to find my ?." "Hmm," he said.'
' "I Can\'t help you with that." "How about a ? to'
' make you feel ?\?"',
(profession, landmark, container, species_of_monkey,
proper_name, common_household_item,
'string that used to be tied to a red balloon',
state_of_mind);
If you found yourself looking back and forth to make sense of that, then you know why positional parameters simply don't scale — they require the programmer to keep O(n) things in their limited brains. String interpolation solves this by putting the expressions in-line, so that each part of the string interpolation has meaning in isolation.
A language template is a whole language, complete with its own version of
if-else and for, and stored in files separate from
the rest of the code. For example, JSP is a templating language frequently
used with Java programs. JSPs have their own looping constructs for iterating
over elements of a list, and JSP code is stored in .jsp files
instead of being near the java code that invokes it.
Full blown templating languages are the right tool for many problems, but they still can't compete with string interpolation on several fronts.
First, full blown templating languages, with a few exceptions, do next to nothing to solve escaping problems. XSL does quite well on this front with HTML (but has no support for other content types); but JSP and PHP, which are two of the most popular means of generating dynamic HTML; do not escape by default — they push the responsibility onto the programmer to understand the myriad escaping conventions and to choose the right one consistently and correctly.
Second, templating languages tend to introduce a lot of boilerplate. There's a certain amount of structure that always has to be there. For example, in the below XSL example, all the grey text is inconsequential to the actual functioning of the template and serves only to hide the one salient (bold) bit.
<?xml version="1.0" standalone="no" encoding="UTF-8"?> <!DOCTYPE xsl:stylesheet [ <!ENTITY % xhtml PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> %xhtml; ]> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xsl:text>Hello World</xsl:text> </xsl:template> </xsl:stylesheet>
Third, templating languages don't make simple things simple. If I
want to make a piece of text bold, I would much prefer to write
"$text" next to where I need it than to write and
comment a separate multi-dozen-line file to hold a template.
DOM manipulation is familiar to many Javascript hackers:
var myBoldTag = document.createElement('B');
myBoldTag.appendChild(document.createText(text));
This is more verbose than the "$text", but it
is inline and is safe.
It's an excellent choice in many circumstances, but it is much
slower than injecting markup into innerHTML. According
to Peter-Paul
Koch of quirksmode.org, DOM manipulation takes 30 times
as long as innerHTML on IE 6.0, the most commonly used browser.
Tainting attempts to avoid security breaches by marking strings that come from untrusted sources, such as CGI parameters, in a way that sensitive operations can check. When strings are concatenated, the result is tainted if any of the inputs were tainted. A SQL library can then check whether the query string is tainted and refuse to execute it.
Tainting mitigates some security problems but does not address correctness — it may make it impossible for an attacker to tweak a CGI parameter to cause a SQL injection attack. It also does not eliminate security problems because tainted inputs can be "laundered." E.g., if your code looks like:
"INSERT INTO Table VALUES ('$name')"
# Result is tainted because $name is tainted
then a programmer might reasonably rewrite it as
"INSERT INTO Table VALUES ('${sql_escape(name)}')"
# sql_escape removes the tainting
Now, the SQL is correct and safe, but a tainted input is in the
database. If another page retrieves that from the database and
displays it, then you have a Script Injection vulnerability. Tainting
does not encode the "type" of the string (E.g.,
text/plain vs text/html) so does not, by
itself, contain enough information to provide safety and correctness,
and the taint is not serialized with the string in the database, so
can be effectively laundered.
name = getUserInput('name'); // Pretend getUserInput returns a tainted value.
name = sql_escape(name); // Now name is untainted
execute("INSERT INTO Table (name) VALUES ('$name'");
Now the value is in the database. name is not
tainted, but it's not safe to inject it into innerHTML
since SQL's escaping conventions are very different from HTML's.
names = query("SELECT name FROM Table");
var stringBuffer = [];
for (var i = 0; i < names.length; ++i) {
stringBuffer.push('<tr>', '<td>', names[i]);
}
div.innerHTML = '<table>' + stringBuffer.join('') + '<\/table>';
Regardless of whether the names are tainted, they're unsafe to use in innerHTML. Maybe this distinction between HTML and SQL seems irrelevant to web applications, but web applications already combine three languages, CSS/JS/HTML, each with different escaping conventions.
The E-language, and a few others, provide "quasi-literals" which are used to produce and manipulate abstract syntax trees (AST), and to convert them back to strings, in a way that makes the fact that ASTs are being manipulated transparent to the developer.
An AST for
might look like<a href="http://foo.com/?bar=$baz">
Element : a
Attribute : href
Value
Url
Protocol : http
Authority
Domain : foo.com
Path : /
Query
Pair
Key : bar
Value
QuasiLiteralReference baz
The QuasiReferences can be evaluated to produce a new parse tree which can then be rendered to a string. The renderer has enough information to maintain a stack of escaping conventions to apply, and so can guarantee that it only produces well-formed output.
QuasiLiterals are a very elegant solution to many problems, but I find quasi-literals poorly suited to Javascript because they require generation, at runtime, of an abstract syntax tree, and object creation is expensive in the most frequently used Javascript interpreters.
Quasi-literals are also poorly suited to HTML because HTML's tags
do not form a tree structure — formatting tags mark ranges of
text and do not need to nest, and all the input elements inside a form
are grouped together, so it is not unusual to see a
</form> tag at a different nesting level from the
open <form> tag. For example, the following HTML,
which is valid according to the HTML 5 draft,
is parsed to the same DOM Tree as the HTML below:<b>A <span style="color:blue">B</b> C</span>
<b>A <span style="color:blue">B</span></b><span style="color:blue"> C</span>
Implementing quasi-literals for HTML in Javascript would require building a full HTML parser, which the browser already has, just to do inference that can be done purely at the lexical level. The resulting implementation would be large and slow given the current state of Javascript interpreters.
If the exemptions defined below are not used, then a string interpolation should produce the same tokenization for the text between substitutions regardless of the values substituted.
Developers should not need to understand which escaping conventions are appropriate when.
String interpolations are often used as a poor man's domain specific language. The same scheme should allow for at least SQL query generation, HTML and XML generation, URI composition, and shell command composition. Anecdotally, these domains seem to cover the majority of injection attacks.
Developers should never need to work around the system. If a developer knows a priori that a string is safe in context, then they should be able to exempt it from escaping conventions. These exemptions should be easy to find with grep and other code searching tools.
Content-type marking uses the runtime type system to convey information about the content of a string. A DHTML rich-text editor control might return a string like object encapsulating known-safe HTML,new HtmlSnippet(content), instead of just returning a string.
This also addresses the problems identified when discussing tainting by distinguishing a string that is safe to inject unescaped into HTML from one that is safe to inject into SQL unescaped.
O(length_of_output + number_of_substitutions).
If a string interpolation includes human language text visible to the end-user, the application needs to be able to replace messages with messages in another language, and substitutions might move or be reordered within a message.
Libraries should be able to operate in an unsafe mode where they issue warnings if the input is not a string interpolation, to help developers find string concatenation and replace it with string interpolation.
The first thing we need to do is distinguish literal text from substituted values. Since our definition of security is to "ensure the same tokenization of literal text regardless of substitution values", we need to identify literals and substitutions, and once we've done that, we can examine literals to figure out how to escape substitutions.
Original Interpolation: "SELECT * FROM Table WHERE name='$name' AND modified > ${new Date(d)}"↓ Split around substitutions: Literal "SELECT * FROM TABLE WHERE name="Substitution nameLiteral "' AND modified > "Substitution new Date(d)↓ JS w/out Syntactic Sugar: new StringInterpolation(["SELECT * FROM TABLE WHERE name='", name, "' AND modified > ", new Date(d)])↓ With substitutions escaped: "SELECT * FROM Table WHERE name='O\'Reilly' AND modified > '2007-12-31 12:30:00'"
We split the interpolation into literals and substitutions so that the SQL
library then has enough information to figure out how to escape
name and
new Date(d). We bundle the parts together in
a StringInterpolation object which the SQL library can manipulate.
Now that we have a StringInterpolation object, we need
to figure out how to add a bit of syntactic sugar to allow developers
to produce a StringInterpolation instance by writing
something like "Hello
$name_of_planet!".
In many languages, we could write a macro to convert from
MY_MACRO("Hello $name_of_planet!") to
new StringInterpolation(['Hello ', name_of_planet, '!']),
but Javascript has no built-in macro system.
Luckily, it does have an eval which resolves
references in the calling scope: function ()
{ var x = 4; return eval('x'); } does return the number 4. We
can use this odd property to bolt macro-expansion onto Javascript.
eval(MY_MACRO("Hello $name_of_planet!"))MY_MACRO returns a string of JS code ↓ eval('new StringInterpolation(["Hello ", name_of_planet, "!"])')Which eval effectively inlines ↓ new StringInterpolation(["Hello ", name_of_planet, "!"])
So we can use eval to do macro expansion in
Javascript. I show below how this can be made nicer syntactically, so
that eval need not be explicitly part of the API, and how
to speed this up by expanding macros early.
But first, let's nail down the API.
Let's come up with some examples of what we want to do. If we want it to run in a language without string-interpolation built-in, we need some way of building a string interpolation using existing language constructs such as function calls and string literals. Let's call these open templates and use the following syntax.
sql = open(Template("SELECT * FROM Table WHERE id=$id"));
url = open(Template("http://$myDomain/path?q=$paramValue"));
html = open(Template('<div style="color: #$fg">$name</div>'));
// where $id, $paramValue, etc. are bound to variables in the local
// scope at the time open(Template(…)) is invoked.
// Ruby's embedded expressions are popular, so let's define an alternative
// to the $ followed by a variable name syntax, like ruby
html = open(Template(
'<tr><td>${cells[0]}<td>${cells[1]}</tr>'));
// where the expression inside the ${…} is an expression in the
// surrounding language, with balanced curly-brackets. References evaluate
// in the local scope, and exceptions propagate out of the interpolation.
// To handle text L10N, we'll allow a run of text to be annotated.
// A parser can find all string interpolations and extract messages.
html = open(Template(
'<p>${{{msg="Greeting":Hi ${{{ph="User":$user}}},'
+ ' how are you today?}}}</p>'));
// where 'msg' brackets a message, and 'ph' identifies a placeholder
// and tells a translator what to expect.
// ${{{…}}} blocks contain literal strings and they nest,
// and ${…} blocks contain code and don't nest.
The open(Template(…)) syntax identifies a string
interpolation, but how do we know how to interpret substitutions?
We could let the user specify the "type" of the interpolation, a la
sql = sql(Template("SELECT * FROM Table WHERE id=$id"));
url = url(Template("http://$myDomain/path?q=${params.q}"));
html = html(Template('<div style="color: #$fg">$name</div>'));
But that expands the size of the API, and doesn't give us enough
information. Consider the HTML example above. If that content is
inserted into a <title> or
<textarea> tag, then the angle brackets should be
interpreted literally, and if it were inserted into a
<style> tag, then the content should be interpreted
as CSS, so that scheme doesn't give us enough information to make
correct decisions. We could require more specific information, but
that would require the developer to know a lot about the language,
commit in advance to how the interpolation is going to be used, and,
for safety's sake, the code that injects into innerHTML
would still need to check that it's appropriate in context.
Finally, we want to allow an interpolation to be composed from smaller parts:
var tableHtml = open(Template("<table"));
if (data.length) {
tableHtml.append(open(Template(" cols=${data[0].length}")));
}
tableHtml = tableHtml.append(open(Template(">")));
for (var j = 0; j < data.length; ++j) {
var row = data[j];
tableHtml.append(open(Template("<tr>"));
for (var i = 0; i < row.length; ++i) {
tableHtml.append(open(Template("<td>${row[i]}")));
}
}
tableHtml.append(open(Template("</table>")));
and if we require the developer to supply fine grained context at every step, then they need to know the HTML standard in detail, and get it correct at every step.
Let's take advantage of late binding by letting the library that
uses the string value supply the context and the knowledge about how
to determine which escaping scheme to use for each substitution. When
open(Template(…)) is called, all we have is a
series of literal pieces of text, interspersed with substitution
expressions. Since the substitutions need to be evaluated in local
scope, they need to be evaluated immediately, so the URL example above
url = open(Template("http://$myDomain/path?q=${params.q}"));
yields
| Literal: | "http://"
|
| Substitution: | myDomain
|
| Literal: | "/path?q="
|
| Substitution: | params.q
|
/**
* A string like object that encapsulates a string interpolation's template and
* substitutions, and provides a mechanism by which substitutions can be
* re-escaped using information available at the strings final output point.
*
* @param {Array} args an array whose even elements are literal strings, and
* whose odd parts are substitutions.
* @constructor
*/
function StringInterpolation(args) {
this.args = args.slice(0);
}
and let's make it behave like a string when viewed in a debugger, or
put into a sorted collection, or used as a key in a hashmap:
// Next, let's make it string-like. If used in a string context, it should
// coerce to a string for debugging.
StringInterpolation.prototype.toString = function () {
return this.args.join('');
};
StringInterpolation.prototype.valueOf = function (type) {
if (type === 'string') { return this.toString(); }
if (type === 'boolean') { return this.toString() !== ''; }
return this;
};
Which suggests a way to bolt string interpolation onto Javascript.
If the open function behaves like eval, then
we can easily evaluate the substitutions in the local scope, which
would be really convenient if Template returns Javascript
that creates a StringInterpolation:
// open is eval, so that we can evaluate substitutions in the local scope.
var open = eval;
// To make sure that
(Template("http://$myDomain/path?q=${params.q}") ===
'new StringInterpolation(["http://", myDomain, "/path?q=", params.q])'
// we can define Template as
function Template(interpolation) {
var parts = interpSplitStringIntoParts(interpolation);
return makeStringInterpolationConstructor(parts);
}
function interpSplitStringIntoParts(interpolation) {
// Generate an array where even elements are literals and odd ones are
// substitutions. This may involve putting empty literals in between
// contiguous substitutions.
…
}
function makeStringInterpolationConstructor(parts) {
// Make a fake String Interpolation constructor, so that interp (aka eval)
// will evaluate it in the context of the local variables it appears with.
var javascript = ['new StringInterpolation(['];
for (var i = 0, n = parts.length; i < n; ++i) {
if (i) { javascript.push(','); }
if (i & 1) {
javascript.push('(', parts[i], ')');
} else {
javascript.push(javascriptStringLiteral(parts[i]));
}
}
javascript.push('])');
return javascript.join('');
}
But how will a library developer turn the interpolation into a string. There's a couple cases we need to consider. First, to support functional decomposition string interpolations need to nest. So string interpolation processing will need to recurse if a substitution value is a string interpolation. This recursion could be tricky, and we still need to figure out how to do exemptions, so we shouldn't rely on all library implementors to do it correctly. A library writer needs to know the following:
The last is a bit complicated since the escaping might change the
context. In <a href=$url>
we'd like to be able to surround $url with quotes which
would be different from <a
href="http://$domain/"> where $domain
is already in a quoted string. So as a library writer handles each
literal or substitution in turn, it needs to keep track of context.
The context will likely correspond to a non-terminal on the left hand
side of a context-free grammar rule.
So let's try to let the library writer handle computing the context
and escaping substitutions, and abstract away the problem of iterating
over pieces and recursing into substitutions that are interpolations.
Let's define one operation to keep track of the context for a literal
chunk, and one to process a substitution. The literal handler needs
to take the context that the start of the literal appears in, and
returns the context at the end of the string. The substitution
handler needs to take a context, perform escaping, and return the
context at the end. Let's add a method to our
StringInterpolation class that takes the literal handler
(which we'll call contextScanner) and the substitution
handler (called escaper):
/**
* Perform custom interpolation.
* @param {Function} contextScanner takes (string, context, outputBuffer) and
* returns an updated context. The type of context is up to the caller.
* @param {Function} escaper takes (substitution, context, outputBuffer),
* appends an appropriate string-form of substitution onto outputBuffer,
* and returns the context.
* Substitution may be of any type. Zero or more strings may be pushed onto
* outputBuffer which will be joined on the empty string to produce the
* interpolated string.
* @param {Array.<string>} outputBuffer the buffer that output is written to.
* An Array that can be joined on the empty string to produce the
* interpolated string.
* @return the context. This is returned so that the caller can examine it and
* rror out if it is not a valid ending state.
* E.g., an XML templater could throw an exception if tags weren't closed.
*/
StringInterpolation.prototype.interpolate = function (
contextScanner, escaper, initialContext, outputBuffer) {
var context = initialContext;
for (var i = 0, n = this.args.length; i < n; ++i) {
var arg = this.args[i];
if (i & 1) { // args is a substitution
if (arg instanceof StringInterpolation) {
// Recurse. This allows, e.g., production of a WHERE clause as one
// interpolation which can then be substituted into a SELECT stmt.
context = arg.interpolate(
contextScanner, escaper, context, outputBuffer);
} else {
context = escaper(arg, context, outputBuffer);
}
} else {
outputBuffer.push(arg);
context = contextScanner(arg, context, outputBuffer);
}
}
return context;
};
We don't specify a type for initialContext since we
never need to analyze it. The library writer can give us a primitive
or object, and we'll pass it back and forth between the
escaper and contextScanner that they supply.
Then when the the whole interpolation is done, the library writer can
inspect the context and determine whether it's in a valid end
state.
The whole point of auditable exemptions is to allow a developer to substitute a value that they know is suitable in context without working around the system. The use of this feature breaks security, but I believe it's necessary because letting developers intentionally do unsafe things within the system is better than the alternatives: developers hastily and incorrectly reimplementing parts of the system from scratch, or unintentionally doing unsafe things by reverting to simple string concatenation.
There are a few ways we could let a user specify an exemption:
// A special raw flag that appears in the interpolation.
html = open(Template("$raw{foo}");
// A special type that specifies the context in which something is meant to
// appear.
safeFoo = new Html("foo");
html = open(Template("$safeFoo");
// A special type that does not specify context.
rawFoo = new RawStringSubstitution("foo");
html = open(Template("$rawFoo");
The first option violates the principle of late binding, and
decomposability. Maybe sometimes foo will come from a
source that is known to be good, and sometimes it won't and needs to
be escaped.
The second option is a great idea, and solves the problems
identified with tainting above by allowing a knowledgeable developer
to assert the type. A library writer in their escaper
could check the type and elect not to escape certain classes of
substitution values, so this requires no work from us.
But the second doesn't work well for developers who don't know details of the language being generated, it puts a burden on every library writers to meet our goal of auditable exemptions, it violates our goal of providing primitives that work in many domains, and there is no single string that we can search for to audit all exemptions.
I like the second approach and library writers can support it, but I think we still need something like the third option which is not language specific.
Since we implemented the interpolate method on
StringInterpolation, and the interpolate
recurses when it sees a nested StringInterpolation
we can implement RawStringSubstitution a subclass.
/**
* A string interpolation that always outputs the same content. This allows
* exemptions to the normal escaping scheme, so should be used sparingly.
* To audit all uses, run
*
* $ egrep -nH '\bRawStringSubstitution\b' <file0.js> …
*
* @param {string} s raw text that should not be escaped.
*/
function RawStringSubstitution(s) {
this.s = String(s);
}
// Subclass StringInterpolation so that StringInterpolation.interpolate will
// delegate to this class when its used as a substitution value.
RawStringSubstitution.prototype = new StringInterpolation([]);
RawStringSubstitution.prototype.constructor = RawStringSubstitution;
// Now override interpolate to not invoke the escaper.
RawStringSubstitution.prototype.interpolate = function (
contextScanner, escaper, initialContext, outputBuffer) {
outputBuffer.push(this.s);
return initialContext;
};
RawStringSubstitution.prototype.toString = function () { return this.s; };
Now that we have an idea of what a string interpolation is, and how context scanners give escapers information cooperate to produce safe & correct output, how do we design a useful escaper? Let's consider the different ways to escape a string.
var ids = [1, 2, 3, 4];
var column = 'value';
var foo = 'foo';
open(Template("SELECT $column FROM Table WHERE id IN $ids AND foo LIKE $foo"))
=== "SELECT `value` FROM Table WHERE id IN (1, 2, 3, 4) AND foo LIKE 'foo'"
My friend, J0hn <Q> D'oe was named by his techie parents during the middle of the internet boom, but despite his geeky pedigree, he has frequent trouble with web forms. Here's how an application might use safe string interpolation to render a form that works for J0hn:
userName = "J0hn <Q> D'oe"
|
|
<a href="logout">
<img src="logout.png" title="Logout from $userName's account">
</a>
<form onsubmit="
if (!/\S/.test(this.elements.fullname.value)) {
alert("$userName, please enter your full name");
this.elements.fullname.focus();
return false;
}">
<input type=hidden name=userName value=$userName>
Welcome $userName,
<p>
Full Name: <textarea name=fullname>$userName</textarea>
<p>
DOB: …
|
<a href="logout">
<img src="logout.png" title="Logout from J0hn <Q> D'oe's account">
</a>
<form onsubmit="
if (!/\S/.test(this.elements.fullname.value)) {
alert("J0hn \74Q\76 D\47oe, please enter your full name");
this.elements.fullname.focus();
return false;
}">
<input type=hidden name=userName value="J0hn <Q> D'oe">
Welcome J0hn <Q> D'oe,
<p>
Full Name: <textarea name=fullname>J0hn <Q> D'oe</textarea>
<p>
DOB: …
|
Since we can differentiate attributes and elements whose content is Javascript from those that contain plain text, we can substitute in numeric, boolean, and null values literally to make it easy to compose event handlers.
Our implementation of StringInterpolation.interpolate
works on a single left to right scan, which meets our goals for
efficiency, O(length_of_output + number_of_substitutions),
and it allows us to handle cases like the below without backtracking:
// Output <h1>, <h2>, … as appropriate.
headerHtml = open(Template("<h$n>$sectionTitle</h$n>"));
Since we're working at the lexical level instead of manipulating parse trees, most of the grammars we're going to deal with are regular and so can be implemented in terms of a FSM which requires zero lookahead and zero lookbehind, LA(0) & LB(0). But we'd like to use the state variable as an input to the escaper to choose an appropriate escaping convention, so we want to avoid the potentially exponential number of states required by regular expressions, since it makes the mapping from states to escaping conventions complicated.
State machines are nice though since the entire state is representable by an integer. In many languages, object creation is more expensive than computing primitives, and especially so in many Javascript interpreters, so it'd be great if we could get the benefits of a simple state representation without an exponential explosion of states.
Our StringInterpolation.interpolate function has
access to the output buffer — all the content written so far, so
we effectively have access to infinite lookbehind: LA(0) &
LB(∞). This means that we can avoid the exponential explosion
in number of states, by keeping track of key pieces of data.
In open(Template('<textarea onchange="alert($foo)">'));, we need to keep
track of the tag name and attribute names since the body of a
textarea tag has different escaping conventions for
content than other tags, and the content of the onchange
attribute is Javascript and so needs to layer html escaping
conventions on top of Javascript escaping conventions.
To track the current tag name and attribute name as we walk over a string of html is to place a bookmark when we see the beginning and grab the value from the output buffer when we see the end. Then, when we see the beginning of the attribute value or the end of the tag, we can choose a state from a limited set.
So, ignoring html attributes, we might specify a state machine for html as as:
# Specify a state that handles text nodes
pcdata:
'<' : tag_start
# Once we've seen a <, look for a tag name
tag_start:
# Treat alpha as matching any alphabetic character.
# The "record" below is a side effect of the transition that places a
# bookmark at the current location, so we can "store" the value into
# the variable tag_name.
alpha : tag_name, record(tag_name)
'/' : close_tag_start
# If we don't see something that starts a tag, reinterpret the character
# as if we were in pcdata. This can be done by copying states into this
# state.
else as pcdata
close_tag_start:
alpha : tag_name # no need to record name for close tags
else : error("expected letter after </")
tag_name:
alpha, num : tag_name
space, newline : tag_body, store(tag_name, 'i')
'>' as end_tag, store(tag_name, 'i')
else : error
…
and then when we need to decide what to do when we reach the end of a tag:
def end_tag(tag_name): 'xmp': cdata 'style': cdata_css, store(attrib_delim, '') 'script': cdata_js, store(attrib_delim, '') 'title': rcdata 'textarea': rcdata 'plaintext': plain_text else: pcdata
which switches on the tag_name variable to decide how to interpret the content following the tag. The HTML5 spec explains the distinctions between CDATA, PCDATA, and RCDATA.
Now our HTML escaper can use a lookup table that maps states (pcdata, tag_start, tag_name, cdata, etc.) to escaping functions. But our state has to include bookmarks and variables, so how have we addressed the problem. Since we have mappings like end_tag above that lists the values of tag_name, we can inspect the state machine specification to determine the set of all values we're interested in, and make a table mapping those to integers, and bitpack the variable state onto the end of the state machine state. If we constrain ourselves to keeping one open bookmark at a time, we can also bitpack the bookmark.
If we have a string interpolation in a tight loop, we'd like to avoid recomputing context over and over again.
Once we've figured out how to represent our state variable as a primitive or immutable object, we can memoize a lot of the output of our context scanners. In the above example, our context scanner takes a literal string, an input state, an output buffer, and produces an output state. In the frequent case where we're not recording a variable, our output state is specified entirely by the literal string and the input state.
The below wraps a context scanner that doesn't require the output buffer to memoize results. The same scheme can be extended to our state machine based context scanners if we can determine whether a state has a bookmark or not.
/**
* Given a context scanner that takes immutable contexts, and that consistently
* returns the same context for any (literalString, context) pair, cache results
* to improve string interpolation performance.
*
* @param {Function} contextScanner a context scanner as specified in the
* StringInterpolation.interpolate method's API.
* @return {Function} a contextScanner that is equivalent to the input if the
* conditions in this method's description hold.
*/
function memoizingContextScanner(contextScanner) {
var cache = {};
var size = 0;
return function (literalString, context, outputBuffer) {
// If context is a number it won't contain \0, so this is unambiguous.
var cacheKey = literalString + '\0' + context;
if (cacheKey in cache) {
return cache[cacheKey];
}
// Flush the cache if it gets too large to bound memory consumption.
if (++size === 100) {
cache = {};
size = 0;
}
return cache[cacheKey] = contextScanner(literalString, context, undefined);
};
}
I believe that the number of times a context scanner will end with a live bookmark will be tiny in practice. In the HTML scanner there are two cases where it might happen:
// When we have a substitution in the middle of a tag name.
open(Template("<h$n>"))
// Or a substitution in the middle of an attribute name.
open(Template("<a on$handlerName=foo()>"))
I benchmarked the Javascript
open(Template(…)) implementation,
and when the string interpolation is statically optimized to remove
the eval call, it performs competitively with string concatenation.
The optimized call to open(Template(…)) is about
2 to 4 times as expensive as straight string concatenation, but is a fraction of
the cost of building the DOM using appendChild, so should be fast
enough for most applications.
- # rows
- The number of table rows being generated
- string +=
- How long does it take to compose html using string concatenation? This is quick, but unsafe.
- Array.join
- How long does it take to compose the string using
Array.pushandArray.join('')? This is also quick, scales to large chunks of html on IE, but is unsafe.- open(Template(…))
- How long does it take unoptimized
open(Template(…))to compose html. This is safe, but is slow due to frequently calling eval.- open(Template(…)) 2
- How long does it take optimized
open(Template(…))to compose html. This is safe, and reasonably fast. I optimized out the call to eval by rewritingopen(Template("foo $bar"))withnew StringInterpolation(['foo ', (bar)]).- DOM
- How long does it take to compose a table using
document.createElementandElement.appendChild? This is safe, and so probably the best comparison toopen(Template(…))but is quite slow.- render time
- How long does the browser take to parse, layout, and render the html generated by any of the first 4 methods?
On Firefox on Linux:
| # rows | string += | Array.join | open(Template(…)) | open(Template(…)) 2 | DOM | render time |
|---|---|---|---|---|---|---|
| 1000 | 54 ms | 68 ms | 1043 ms | 204 ms | 508 ms | 586 ms |
| 5000 | 267 ms | 332 ms | 5255 ms | 1159 ms | 2528 ms | 1458 ms |
On IE6 on Windows XP running under VMWare:
| # rows | string += | Array.join | open(Template(…)) | open(Template(…)) 2 | DOM | render time |
|---|---|---|---|---|---|---|
| 1000 | 141 ms | 156 ms | 1328 ms | 328 ms | 1782 ms | 281 ms |
| 5000 | 1531 ms | 1297 ms | 12359 ms | 2969 ms | 25703 ms | 1594 ms |
Most Javascript heavy applications have quite a lot of code that
produces html that looks like that on the left below. It can easily
be migrated piecemeal to use string interpolation by replacing the
yellow highlighted bits with calls to
open(Template(…)) and .append as done
on the right.
// Generate a select list like // <select name=state> // <option value=AL>Alabama</option> // … // </select> var states = { AL: "Alabama", AK: "Alaska", AZ: "Arizona", … }; var abbreviations = []; for (var k in states) { abbreviations.push(k); } abbreviations.sort(); var selectListHtml = '<select name=state>'; for (var i = 0, n = states.length; i < n; ++i) { var abbrev = abbreviations[i]; selectListHtml += '<option value=' + abbrev + '>' + states[abbrev] + '</option>'; } selectListHtml += '</select>';// Generate a select list like // <select name=state> // <option value=AL>Alabama</option> // … // </select> var states = { AL: "Alabama", AK: "Alaska", AZ: "Arizona", … }; var abbreviations = []; for (var k in states) { abbreviations.push(k); } abbreviations.sort(); var selectListHtml = open(Template('<select name=state>')); for (var i = 0, n = states.length; i < n; ++i) { var abbrev = abbreviations[i]; selectListHtml.append(open(Template( '<option value=$abbrev>${states[abbrev]}</option>'))); } selectListHtml.append(open(Template('</select>')));
We can bolt string interpolation onto a language that has an
eval operator that evaluates in local scope, but as the
benchmarks above show, this implementation comes at a significant
runtime cost.
The form below connects to a script that can be used to precompute
the StringInterpolation constructor so that
eval isn't called.