Opera parser monster eats unicode
Thursday, 7 April 2011
Whilst writing my own parser I found weird things in Opera’s JavaScript parser. I was testing what the various browsers allowed with unicode escapes and it turns out Opera seems more lax than others. My discovery began with the following code:
try {eval("\\u0066\\u0061\\u006c\\u0073\\u0065");} catch(e) {alert(e);}
What do you expect the undefined variable to be? It’s a unicode encoded “false” hehe so we can have a variable called “false” if we use unicode escapes on Firefox but what about Opera? Well it’s actually looking for a variable called “false5”. Why? Because the JavaScript parser seems to be off by one when using eval with unicode escapes so it thinks the \u006 is actually \u0065 and thus the “5” is added onto the string.
Pretty cool, so what else can we do? Well, Opera seems a bit more lax than the other browsers when it comes to unicode escapes, for example this is perfectly legal:
\u=alert,u(1)
Pretty nuts right? You can use an incorrect unicode escape and the backslash gets ignored. Another example:
\u000x=alert;u000x(1)
And finally I leave you with this, you can make \u become uu when inside an eval statement:
window.__defineGetter__("uu",function() { alert(1) });eval("\\u");
No. 1 — May 24th, 2011 at 8:34 pm
Gaz I don’t follow – If Opera thinks \\u0066\\u0061\\u006c\\u0073\\u0065 is really ‘false5’ – where does the ‘5’ come from? I see what you’re saying, but that would mean that Opera sees the last code point as \\u0065 + 5. I mean, it found the ‘e’ right, which is the \\u0065, but it sounds like you’re saying it found a \\u006 followed by an extra 5.
No. 2 — May 24th, 2011 at 8:34 pm
BTW I added a nice little to my comment and got a nice PHP error from spambot or something – try it out 🙂
No. 3 — May 24th, 2011 at 8:44 pm
Hey chris, yeah so what opera does parse the \u0065 but their decoder puts the parser back a character so the “5” comes from the previous unicode escape (I spend way too much time studying js 🙂 )