Breaking HTML parsers for fun

By Gareth Heyes (@hackvertor)

Published 15 years 6 months ago • Last updated March 24, 2025 • ⏱️ 3 min read

I was experimenting with some HTML vectors to break the various HTML parsers in the browsers, I wanted to continue till I found a cool one for Firefox because I like to bully the memory hogging browser as I use it a lot. I found some weird rendering in Firefox, Chrome and Opera. It started off with cdata nodes and different behaviour in IE and Firefox. Firefox didn't execute my vector but IE did. This was interesting because FF was rendering the cdata inside the attribute.

Even as I write this blog post I find more stuff :) obviously I can't waste a lot of time on this because the parser is soooo bad there are literally hundreds of possibilities. Anyway I better describe my vectors. So back to the cdata stuff the original vector looked like this:-


<script <![CDATA[>///]]>
alert(1)</script>

I thought (wrongly as you'll see later) that Firefox was parsing the cdata section and removing it's contents like a HTML comment as the vector didn't execute. Then I started experimenting, it got interesting :)


<img <iframe ="1" onerror="alert(1)">
<img <iframe/="1" onerror="alert(1)">

So as we see here the iframe is rendered inside the img tag, you might think so what? But adding the pieces of the puzzle together gives you a different perspective. Combining the knowledge I gained from the cdata rendering, the next vector looked like this.


<![CDATA[<img src="1]]><iframe>">]]>

The iframe is rendered inside a fake attribute as the cdata is rending it as text. Cool so that's pretty nice but still a bit obvious. Remember above how I said I thought FF was rendering the cdata? Well now onto the good stuff. It turns out FF is a crazy cat when it comes to cdata. We can render cdata without the traditional markup required for it, this is bad because we can create vectors that break out of attributes.


<![
>
<img src="]><script>alert(1)</script>">

Argh, Firefox renders the cdata tag even though it only begins with ![, the closing ">" is now ignored and only closes when the ]> is encountered.

Weird stuff but still more, I checked my original vector again in Firefox as I was puzzled with the crazy results, then I noticed it had JavaScript syntax errors! I was like WTF. At first I thought there wouldn't be a way to execute anything because the syntax required was <! which wasn't valid js to begin a line. Then I modified my vector to include E4X, bingo!


<script<{alert(1)}/></script>

Update This vector was also discovered by Mario (http://heideri.ch/jso/?%3Cscript%3C)

Yes that executes :o it appears that FF doesn't require a closing ">" to execute script. The e4x node is created as "<undefined />" as the curles execute js code and return the results (alert returns undefined). I wasn't happy I wanted more ways to execute code, what about e4x processing instructions? That way we can execute JavaScript that appears to start with / but is in fact part of a expression which divides the processing instruction by our function call.


<script<?wtf?>/alert(1)</script>

Finally I thought to myself what about without e4x? Hmmm combine a html comment inside the script attributes which will become a one line js comment (crappy legacy stuff) then I can execute code with a non-closing script tag and a open html comment! :)


<script<!--
alert(1)</script>

Bonus Firefox vector


<!--<img alt=">1" title="<img src=1 onerror=&apos;alert(/love you ff/)&apos;>">

← Back to articles