HTMLReg

By Gareth Heyes (@hackvertor)

Published 16 years 1 month ago • Last updated March 22, 2025 • ⏱️ < 1 min read

Yeah you knew it was coming. This was easier than JavaScript parsing because I can use both the HTML and CSS renderers of the browser to make sure I can parse the code safely. So really this is CSS/HTML reg, I don't support the style tag yet but that shouldn't be difficult as I can just write a RegExp to match the style and contents then parse each rule.

How did I do it? With very little code of course, I use a restrictive RegEx to get the actual tags and attributes then using the DOM I make the browser render the attributes and read each one and delete the actual attributes and styles, then I put each rule and attribute back using a whitelist.

I remove any nodes that aren't legal or malicious, the text portion of the node uses a whitelist of allowed characters and does not allow "<" or ">" this stops partial HTML attacks. Finally to clean up I let the browser render the HTML code for me and rewrite some make it prettier than others.

HTMLReg demo

Remember real men use JavaScript.

← Back to articles