HTMLReg
Thursday, 15 April 2010
Yeah you knew it was coming. This was easier than JavaScript parsing because I can use both the HTML and CSS renderers of the browser to make sure I can parse the code safely. So really this is CSS/HTML reg, I don’t support the style tag yet but that shouldn’t be difficult as I can just write a RegExp to match the style and contents then parse each rule.
How did I do it? With very little code of course, I use a restrictive RegEx to get the actual tags and attributes then using the DOM I make the browser render the attributes and read each one and delete the actual attributes and styles, then I put each rule and attribute back using a whitelist.
I remove any nodes that aren’t legal or malicious, the text portion of the node uses a whitelist of allowed characters and does not allow “<" or ">” this stops partial HTML attacks. Finally to clean up I let the browser render the HTML code for me and rewrite some make it prettier than others.
Remember real men use JavaScript.
No. 1 — April 15th, 2010 at 5:27 pm
<img src=`lo<img src=. onerror=alert(/hacked/)//`>
No. 2 — April 15th, 2010 at 8:34 pm
@test
Nice! 🙂
No. 3 — April 15th, 2010 at 8:43 pm
and fixed 😀