Setters using VBS and constant hacks

I wasn’t gonna blog this because I couldn’t be bothered but Mario asked me if I had it documented anywhere and I guess it’s nice to have it somewhere. So I was looking to create setters in legacy browsers like IE7 and it would be nice to use them on custom objects in IE8. I came up with the following VBS hack:-

execScript("Class c\nProperty Let x(y)\nalert(y)\nEnd Property\nEnd Class\nSet obj=new c","VBScript");obj.x=1;//yeah js calls vbs ;)

Pretty cool calling VBS from JS then using a VBS object inside JS :)

Ok and now some constant stuff I tweeted before but I quite like so I’ll post here too. Constants are weird in JavaScript, when you think about them you think in a value that cannot change but when it comes to objects they can’t be constants for some reason. I dunno why they could be implemented quite easily by creating a Getter only object but anyway:-

//example1
const x={};x.y=123;alert(x.y)//Objects aren't constants!

//example2 (need to run separately of course
const x={};x.toString=x.valueOf=function() { return 1; };alert(x)

Finally I’ll leave you with a quiz, what is “x” equal to without running?:-

var x=1; function y() { x=3; alert(x); return; const x=2;}y()// x == ?

Sandboxed DOM API

Description

I finally sat down and started work on a sandboxed DOM API. Originally I was just going to develop a new framework because the DOM is messy but instead I decided it would be cool to have a safe simulated DOM instead and build a framework on top of that.

It isn’t complete yet and there’s still a lot of work to do but it’s working pretty good. I still need to run some tests on it and try to break it but I don’t have time at the moment as I need to do other stuff.

One of the problems making a DOM API is that IE doesn’t have setter support even in IE8 it doesn’t allow you to define setters on normal objects. Because I spend most of my time hacking stuff it was a fun challenge to make IE support setters on DOM objects and keep my sandboxed whitelists.

It’s quite complicated and quite ugly in parts but it works and I think it’s the only way to support legacy browsers like IE7.

How it works

I have to test for the various setter support including defineSetter, Object.defineProperty and revert to the legacy onpropertychange. Object.defineProperty works fine in IE8 when using a DOM object but I encountered problems when I needed to assign to a sandboxed normal object. Here it gets ugly, I had to create a DOM object for any styles used by a node, this way both Object.defineProperty and onpropertychange allow me to monitor any assignments to the fake style object.

var styles = document.createElement('span');
node.$style$ = styles;
Object.defineProperty(node.$style$, '$'+cssProp+'$', {});
document.getElementById('styleObjs').appendChild(styles);
node.$style$ = styles;
node.$style$.onpropertychange = function(){}

As you can see with the code sample above I have to append the fake style DOM object for onpropertychange otherwise it won’t be called on assignments.

You can see this working by using the following test code:-

document.getElementById('x').style.color='#ccc';

So I proxy off all these functions and make the root node any HTML object, I use CSSReg and HTMLReg to sandbox each modification to a property. Finally where it got complicated was supporting events, currently I only support “onclick” as I’m still testing but what happens is because the code is already sandboxed I don’t need to perform a rewrite so I pass this to JSReg as it’s already been converted, I supply the “this” object as the HTML element this allows the triggered event to call “this” as the current element.

That’s it! I’ve donated the code to OWASP and it will be free to use in your projects, any help testing or suggestions are most welcome, enjoy the demo!

Sandboxed DOM API

Astalanumerator 0.7

Just a quick post to let you know I’ve updated Astalanumerator in case you use it somewhere. I use codeplex to host it as I thought I’d give it a whirl as I’ve seen other people host their projects and it looks decent.

This version contains various CSS fixes and tracks each object within links and via the astalanumerator object, this was quite tricky because I allow stuff like “/a/” etc and because you can click many different properties it’s hard to keep track of each of them. Each click now jumps you straight to the list of props for that object instead of just showing you in the url. The object are colour coded now and the inspector fills the screen. You should notice it’s much faster than the original one and it inspects properties using three methods, 1. A normal for..in loop. 2. A manual list of props I’ve collected 3. Using the IE enumerator object. Enjoy!

Source code
New version
Old version

Can all mozilla people look away now please

Custom setters syntax are being removed from Firefox in the next version.. boo I here you say well at least some of you. If you don’t know Firefox decided it would create it’s own setter syntax (I love it when you do that you know) ages ago and it looked something like this:-

a setter=alert,a=1//calls alert(1)

Whacky indeed. They decided to remove it. So I was messing with JavaScript like I do near enough every day and I stumbled upon this:-

Object.prototype.__noSuchMethod__=function(s){ alert(s); };
1..*(1)

What was surprising was that “alert” returned “*” not 1 as you would expect. The crazyness then continued:-

Object.prototype.__noSuchMethod__=function(s){ eval(s); };1.['alert(1)']()

Not looking at MDC and still not understanding why this was happening Mario pointed out “oh it’s sending the name of the function via the noSuchMethod” then big doh moment oh yeah. But then that means…..we have a new setter syntax!!!!

//existing code
function x(s) {
  eval(s);
}
//our evil injection
Object.prototype.__noSuchMethod__=x;new/a/['alert(1)']

If you work at Mozilla please look away now because I like this crazy syntax so don’t fix it.

Hackvertor Ajax applications

I hate to use the word Ajax because there’s no XML involved just nice JSON but Hackvertor now has Ajax applications! At the moment it’s very rough around the edges but it will improve when I get more spare time to work on them. What does it mean? Well you can now share actual HTML/JS based applications on Hackvertor based on your own code and store/load data. Each time you create a tag inside the applications category it automatically gets added to the applications section. The app can then be in the list when clicking on the navigation.

How to create a HV app

It’s slightly different than creating a tag, apps use a object to contain the mini site. This way you can write a forum in less than 100 lines :) The best practice way I came up with was to use a anon closure, which contains a user variable to get info about the current user. Then each section is split into a different property of the site object. The object doesn’t need to be called site but I thought it was handy to think of it that way.

There are two special properties of the “site” object. They are home this is required in order to run your application and returns the first page the user sees. Second is styles this allows you to style all of your app in one place. All HTML/CSS is controlled by HTMLReg and CSSReg, JavaScript is handled by JSReg.

To see how it works you can see a very simple/bad app I wrote quickly:-
Forum sample code
Forum demo

You can call each sections of your app by using # urls. For example lets say you wanted to call the “Cancel” function of your “site” object. All you need to do is create a url with #Cancel. Arguments can be sent to your functions in links by using “?” for example if you look at the forum, I use #topic?0 to pass the id number to the topic function. This also works nicely with forms, you just give a submit button a value of your site function and it will automatically gather all the form values and send a object with them all referenced. To see how this works in the forum app, just take a look at the Save section. Here is how to submit button calls it:-

<input type="submit" value="Save" />

Hackvertor automatically scans all inputs/links and adds a onclick handler to call your function named in the value attribute or the location.hash of the link. The above HTML actually does something like this:-

<input type="submit" value="Save"  onclick="site.Save(this.form)" />

Limitations

I need to write a DOM layer to allow sandboxed assignments to CSS and HTML attributes. At the moment this makes apps pretty restrictive but you could write a IM application or shared messaging app. This means stuff like document.getElementById won’t work or custom event handlers. You are also limited what kind of HTML/CSS can be executed using the sandboxes. There are browser bugs too :( I’ve not tested it fully yet.

App API functions

getUser() - Returns an object of the current user. The properties include: userID, username, image. Image will only be returned if a user has a picture uploaded.

loadStorage((string) appID, (int) id) - Loads a JSON string. AppID is the application name, id is a number used to refer to a section such as a forum topic.

saveStorage((string) appID, (int) id, (string) JSON, insert|update) - Saves a JSON string. AppID is the application name, id is a number used to refer to a section such as a forum topic. JSON is a string of JSON data you want to store. The last param inserts or updates the JSON data, it uses the userID of the user to check permissions.

Regular expression sandboxing

Birth of the regex sandbox

I decided today to do a proper blog post to explain my reasons for creating regex sandboxes. I don’t often write a lot of words on this blog partly because I’m not very good a making long meaningful sentences and partly because I think the point can often be made in less words. Hopefully this will be useful for someone writing filters.

First off a quote “You can’t parse [X]HTML with regex. Because HTML can’t be parsed by regex. Regex is not a tool that can be used to correctly parse HTML” from (stackoverflow). I agree with the comment it isn’t possible to fully parse HTML with regexes but my goal wasn’t to do that, I wanted to parse a safe form of HTML. I also have a uncontrollable urge to do something that people say can’t be done.

Now we have that out of the way, how did this all begin? Well I was building a char by char JavaScript parser inside JavaScript to allow untrusted code to be executed. Every time I wrote a simple string matching function I found myself making shortcuts and using regexes instead. For example why loop through all characters when you can whitelist the desired ones? I soon found that I had a great advantage of using regexes instead of parsing every character, because I could use the native JavaScript engine to help me.

This lead me to develop JSReg [1], at first it seemed very easy to match JavaScript, the numbers were pretty easy and strings but I then encountered one of the first problems of regex sandboxes. It is very difficult to match something that is matching itself, for example an array can contain pretty much any JavaScript statement and itself but if you are defining it how can you match it? I didn’t really have an answer to this, one of my solutions to this problem was to create a recursive regex that created a second compiler to match inside the first match and so on. But this was slow and because JavaScript doesn’t have lookbehind previous matches would eat characters in the next match (I’ll talk more about this in the design). My other idea was to use backreferences but these are very difficult to track when using multiple regexes and they only return a successful match in my tests it wasn’t possible to produce a perfect array match using backreferences. I could be wrong of course I know I’m not perfect.

The design

My basis of my design was to not rely on 3rd party code were possible that means no jquery etc, in addition I should employ multiple layers of security wherever possible. These were good design decisions. Throughout initial testing the multiple layers proved difficult to break down. For JSReg the first layer was an iframe, the iframe was created each time of execution enabling fresh prototypes and a throw away box once execution had finished. Then I whitelisted the entire JavaScript objects/properties, this was done by forcing all methods to use suffix/prefix of “$”. Each variable assignment was then localized using var to force local variables. Each object was also checked to ensure it didn’t contain a window reference.

Javascript arrays proved tricky as mentioned earlier because of the amount of code that can be included within them, initially I decided to try and match them and their contents. But there were several performance problems of matching all that code and JavaScript regex limitations. For example I use one regex with a replace function to globally match each sequence using groups, the idea is to match all the valid objects first. In the instance of an array you’d first match all regex objects, strings etc because they can contain a “[" and "]” then once all valid objects have been enumerated by the regex engine it will encounter the first “[" of our array.

This works well in practice for every object apart from arrays. In JavaScript the array literal shares the same syntax as the object accessor. Therefore you have to identify the difference between an array or object. Sounds easy?

[][0[0,0[0]]];
+[][0[0,0[0]]];
{}['I am an array']
~{a:0}['I am a object accessor']

As you can see with the samples above, you’d have to match the entire js syntax before the opening “[". Then if you don't match the entire sequence inside the array you won't know if the ending "]” is part of an array sequence or object. This problem was unsolved for a long time. The main reason was in order to protect against window references I rewrite object accessors like obj['abc'] to obj[JSREG_FUNC.gp('abc')] so the function returns a safe string which uses the prefix/suffix of $ e.g. abc becomes $abc$. Because a string is returned of the expression it would break an array if it wasn’t detected.

Detecting an array or object was difficult because of the design too, you see if a regex object is matched like /abc/ and is followed by a object accessor like /abc/['source'] the previous expression is eaten by the parser so the next match is effectively ['source'] which JSReg understandably thinks is an array. A simple way round this would be to lookbehind to see if a whitelist of characters make the opening “[" an array or not. But JavaScript doesn't support lookbehinds! :(

The simple workaround was to use Array(1,2,3) instead for arrays and assume all "[" and "]” were not arrays. This worked but it breaks existing code. Finally after many attempts I think I’ve come up with a solution. I store a list of previous matches and rewrite all array literals and object accessors into a function or method. This means I no longer need to detect the ending of the array as they both have a “)” instead of a “]”. Easily demonstrated with a code example:-

[1,2,3] //becomes:-
A(Number(1),Number(2),Number(3))

window['x']//becomes:-
$window$.JSREG_PROP('x')

Finally as part of the design I check the JavaScript syntax before and after conversion this provides another layer of security if the rewrite fails at any part of matching the code.

The code

JavaScript is difficult to match but I found HTML/CSS easier. At first I started the code for HTMLReg [2] and CSSReg [3] in a similar way to JSReg. Then I realized when hacking my own code how I could make it better to defend against attack. First off I employed a strict whitelist to remove any partial open HTML attacks and evil attributes that were obvious attacks. This means I didn’t stick to the HTML specification, I don’t allow any junk in attributes. For example if you want to include “<" or ">” inside a title attribute then you have to encode it. I may allow them in future if it can be proven safe but I’d rather not fight something I can’t win. You may disagree with what I’ve just said but your filter is probably being pwnd right now.

Once I had my whitelist of tags and attributes I constructed RegExes for any individual parts I wanted to match. For example text nodes, invalid tags and valid attributes, these would be nicely chained together in one big regex. Then each part is grouped so that you can match each expression and validate it.

Here is how it works:-

html.replace(mainRegExp, function($0, $styleTag, $tag, $text, $invalidTags) {}

Notice how I use the replace function, I don’t do html = html.replace because I only want to match the text in my regexes. I prefer to use replace because I have a nice reference to each group like this automatically with local variables. This was a lesson I learned from developing JSReg as if the replace fails it will return your plain code rather than rewrite it.

Inside the function I include a couple of things in each block I’ll use the text node as an example:-

if($text !== undefined && $text.length) {
output += $text;
parseTree+='text('+$text+')\n';
}

Here if the text node is matched it adds it to the output. Parse tree is a nice way of keeping track of what you’ve matched. It’s a useful debugging reference. The if statement is required because of browser inconsistencies when matching groups.

In the case of HTMLReg for performance reasons I have a whitelist to match a general tag, then inspect it further so I’m only matching a smaller amount of text. You can see that with the following code:-

if($tag !== undefined && $tag.length) {
  if(!new RegExp('^<\\\/?'+allowedTags.source+'[\\s>]','i').test($tag)) {
	return '';
 }
parseTree+='tag('+$tag+')\n';
if(!/^<\/?[a-z0-9]+>$/i.test($tag)) {
  $tag = parseAttrValues($tag);
}
output += $tag;
}

Once my tag has been matched I then start to parse attributes, I do this by creating a hidden div and reading it’s contents. This is cool for a number of reasons, we can read what the browser reads and our code automatically gets formatted. Because we then use the DOM it means our entities will be decoded for us. While testing I found that JavaScript won’t be executed using innerHTML without certain tags or attributes, if I whitelist the tags and attributes then I can use the innerHTML safely without having to worry about execution. I have a backup plan if this fails, I could be more strict with certain attributes if it’s possible to execute code.

Onto CSSReg! It didn’t exist nor did I think it was needed as I thought I could rely on the browser to ensure multiple CSS rules didn’t cross over from single CSS dom rules. I was wrong. It was proven by many talented researchers (mentioned in the thanks section) that it wasn’t possible to get the browsers to rewrite CSS safely. I had to write another regex sandbox. This time it wasn’t as tricky as first appeared. As long as I didn’t try to follow the madness of the specification again I should be able to produce some CSS that was safe from malicious code yet is useful enough to use.

First off I gathered a list of properties and identifiers, I removed crappy browser specific extensions yeah they are bad. ALL OF THEM. Then I used the same method of HTMLReg to match each part, the trickiest part this time was urls. There are so many ways to escape a css url in every browser, you have to handle backslash escapes, entities, new lines and backslash hex escapes. The best way I came up with was to whitelist the url first, match everything in-between () and then decode and escape every character that didn’t match the whitelist.

This made it pretty safe across multiple browsers. But there was a problem, some browsers decoded the CSS even when it was sandboxed correctly e.g. one attack I found was to triple encode the character and the browser would decode the entities and escapes until it produced it’s mangled version of CSS which broke the sandbox. To get round this I created a custom attribute which didn’t match my whitelist “sandbox-style” this allowed CSSReg to store it’s correctly sandboxed style, I used a custom attribute outside of the whitelist to prevent injections of sandbox-style. Once my CSS was stored correctly I could then match it again and rename it back to style which was then returned correctly.

All this trouble was because I wanted the browser to handle invalid HTML for me, any unclosed HTML tags would be automatically closed by the browser engine for me :)

Finally in order to handle selectors I stuck to very simple syntax, either #someid or .someclass and allowed multiple like .someclass1, someclass2 {} this prevents CSS injection based attacks and well as making it easy to parse. Once each selector was matched I restrict which tags are allowed and prefix a application ID to prevent HTML/CSS crossing across sandboxes. I then check if a selector is matched before opening or closing one.

I hope you’ve enjoyed this post as it’s a break from what I normally do but I thought it would be worth the effort to get together as I’ve found some of the concepts the best way to code a solution and hopefully you’ll find it useful.

Thanks

I would like to thank Dave Ross as I was heavily inspired by him especially with the multiple regex references chained together. Eduardo Vela aka “sirdarckcat” for his awesome (?:HTML|JS|CSS)Reg hacks. Juriy Zaytsev aka “kangax” for his excellent input in detecting parsing flaws with JSReg. Kyo for breaking things without even trying. Theharmonyguy for breaking HTMLReg classes and spotting comical spelling mistakes by me. LeverOne for breaking HTMLReg and CSSReg with some quite simply awesome and evil vectors. Mario Heiderich aka “.mario” for making regex objects look insane and provide great input for JSReg and breaking HTMLReg. David Lindsay aka “Thornmaker” finding JSReg parsing errors with ternary operators. Stefano Di Paola for smashing the JSReg stack and proving that non-mortals exist. Achim Hoffmann for providing valuable JSReg input and everyone else who has helped me test and develop JSReg & others.

[1] JSReg
[2] HTMLReg
[3] CSSReg

Month of PHP security

Stefan Esser has launched another Month of PHP security. It includes popular applications which use PHP as well as general bugs. He also includes a general PHP security article that you really should read to help secure your code. I’d also keep an eye out for the hardening of PHP configuration which will be released shortly.

DOM CSS fight at the O.K. Corral

I’ve been having a bit of a fight with DOM CSS. Single css rules in various browsers are carried over to two or more rules in some instances depending which characters you use. This was playing havoc on my HTMLReg sandbox, I whitelist allowed rules so I can’t allow rules to be injected.

The IE gunfighter was strong and stubborn, no matter which method I used it seemed he was always quicker on the draw then me and replaced my cssText faster than I could draw my encoder. I decided to create another sandbox to parse CSS styles called CSSReg this allowed me to control the in-line styles to my new whitelist.

Unfortunately the gunfighter LeverOne “the kid” was remaining elusive from capture. Then Sirdarckcat “Smokey” was causing all sorts of problems in town CSSReg.

Not phased by this crazy town sheriff Gareth “Wyatt Reg” decided to put a end to this chaos. He drove IE out of town with a HTML hack that allowed to retain correctly sandboxed styles:-

<div sandbox-style="background-image: url('http\3a //red/x\3f y\3d 1');">xxx</div>

Then replaced the sandbox-style by parsing the HTML and renaming sandbox-style to style. This stops IE rewriting the CSS and decoding everything but allowing it to handle invalid nested tags.

Next I had to drive the deadly Mozilla gang out of town, I rewrote the CSS url parser to use a strict whitelist and backslash hex escape any invalid characters. “The kid” and “Smokey” have left town for now. HTMLReg town is for now peaceful and quiet. I have seen the sla.ckers gang lurking but they seem to scared to enter HTMLReg at the moment.

Gareth “Wyatt Reg” would also like to thank the following outlaws:-
Mario “Doc holiday” Heiderich
Kyo “Wild bill”
the “Texas” harmonyguy

HTMLReg

Yeah you knew it was coming. This was easier than JavaScript parsing because I can use both the HTML and CSS renderers of the browser to make sure I can parse the code safely. So really this is CSS/HTML reg, I don’t support the style tag yet but that shouldn’t be difficult as I can just write a RegExp to match the style and contents then parse each rule.

How did I do it? With very little code of course, I use a restrictive RegEx to get the actual tags and attributes then using the DOM I make the browser render the attributes and read each one and delete the actual attributes and styles, then I put each rule and attribute back using a whitelist.

I remove any nodes that aren’t legal or malicious, the text portion of the node uses a whitelist of allowed characters and does not allow “<" or ">” this stops partial HTML attacks. Finally to clean up I let the browser render the HTML code for me and rewrite some make it prettier than others.

HTMLReg demo

Remember real men use JavaScript.

Astalanumerator update

I wanted a sexy object enumerator. There wasn’t any. So I developed the terminator of enumerators “astalanumerator”. I have since integrated it into Hackvertor because that where I seem to put everything nowadays. Anyway you can use it by visiting:-

1. http://hackvertor.co.uk/public
2. Type window into the output
3. Click Inspect.

Yeah damn sexy eh? It creates a tree menu of all available properties of a object by checking a big list of JavaScript properties I’ve collected. I use the MS Enumerator object too thanks to Manuel Caballero as I completely forgot about it. You rock!

The recent changes include a escaping bug because I do a crazy hack to pass the objects with modifying the existing js because I’m lazy =) so I have to double encode stuff. Added a colour code for object, functions etc changed to a fixed width font and created a nice preview of the code. Thanks to Adam Bliss for the cool suggestions.