HTMLReg

Yeah you knew it was coming. This was easier than JavaScript parsing because I can use both the HTML and CSS renderers of the browser to make sure I can parse the code safely. So really this is CSS/HTML reg, I don’t support the style tag yet but that shouldn’t be difficult as I can just write a RegExp to match the style and contents then parse each rule.

How did I do it? With very little code of course, I use a restrictive RegEx to get the actual tags and attributes then using the DOM I make the browser render the attributes and read each one and delete the actual attributes and styles, then I put each rule and attribute back using a whitelist.

I remove any nodes that aren’t legal or malicious, the text portion of the node uses a whitelist of allowed characters and does not allow “<" or ">” this stops partial HTML attacks. Finally to clean up I let the browser render the HTML code for me and rewrite some make it prettier than others.

HTMLReg demo

Remember real men use JavaScript.

3 Responses to “HTMLReg”

  1. test writes:

    <img src=`lo<img src=. onerror=alert(/hacked/)//`>

  2. Gareth Heyes writes:

    @test

    Nice! 🙂

  3. Gareth Heyes writes:

    and fixed 😀