Breaking HTML parsers for fun

I was experimenting with some HTML vectors to break the various HTML parsers in the browsers, I wanted to continue till I found a cool one for Firefox because I like to bully the memory hogging browser as I use it a lot. I found some weird rendering in Firefox, Chrome and Opera. It started off with cdata nodes and different behaviour in IE and Firefox. Firefox didn’t execute my vector but IE did. This was interesting because FF was rendering the cdata inside the attribute.

Even as I write this blog post I find more stuff 🙂 obviously I can’t waste a lot of time on this because the parser is soooo bad there are literally hundreds of possibilities. Anyway I better describe my vectors. So back to the cdata stuff the original vector looked like this:-


<script <![CDATA[>///]]>
alert(1)</script>

I thought (wrongly as you’ll see later) that Firefox was parsing the cdata section and removing it’s contents like a HTML comment as the vector didn’t execute. Then I started experimenting, it got interesting 🙂


<img <iframe ="1" onerror="alert(1)">
<img <iframe/="1" onerror="alert(1)">

So as we see here the iframe is rendered inside the img tag, you might think so what? But adding the pieces of the puzzle together gives you a different perspective. Combining the knowledge I gained from the cdata rendering, the next vector looked like this.


<![CDATA[<img src="1]]><iframe>">]]>

The iframe is rendered inside a fake attribute as the cdata is rending it as text. Cool so that’s pretty nice but still a bit obvious. Remember above how I said I thought FF was rendering the cdata? Well now onto the good stuff. It turns out FF is a crazy cat when it comes to cdata. We can render cdata without the traditional markup required for it, this is bad because we can create vectors that break out of attributes.


<![
>
<img src="]><script>alert(1)</script>">

Argh, Firefox renders the cdata tag even though it only begins with ![, the closing “>” is now ignored and only closes when the ]> is encountered.

Weird stuff but still more, I checked my original vector again in Firefox as I was puzzled with the crazy results, then I noticed it had JavaScript syntax errors! I was like WTF. At first I thought there wouldn’t be a way to execute anything because the syntax required was <! which wasn’t valid js to begin a line. Then I modified my vector to include E4X, bingo!


<script<{alert(1)}/></script>

*Update*
This vector was also discovered by Mario (http://heideri.ch/jso/?%3Cscript%3C)

Yes that executes 😮 it appears that FF doesn’t require a closing “>” to execute script. The e4x node is created as “<undefined />” as the curles execute js code and return the results (alert returns undefined). I wasn’t happy I wanted more ways to execute code, what about e4x processing instructions? That way we can execute JavaScript that appears to start with / but is in fact part of a expression which divides the processing instruction by our function call.


<script<?wtf?>/alert(1)</script>

Finally I thought to myself what about without e4x? Hmmm combine a html comment inside the script attributes which will become a one line js comment (crappy legacy stuff) then I can execute code with a non-closing script tag and a open html comment! 🙂


<script<!--
alert(1)</script>

*Bonus Firefox vector*

<!--<img alt=">1" title="<img src=1 onerror='alert(/love you ff/)'>">

5 Responses to “Breaking HTML parsers for fun”

  1. LeverOne writes:

    http://heideri.ch/jso/#39

    http://heideri.ch/jso/#91 [ C ]

  2. Lachlan Hunt writes:

    Firstly, it seems clear from your results that your tests in Firefox were being done in Firefox 3.x with the old parser. You get very different results if you use the newer HTML5 parser in Firefox 4 betas or nightly builds.

    Secondly, it’s not really clear at all what exactly is your criteria for “breaking” the parser, nor why you’re referring to anything of your examples as any kind of vector. I assume you mean a security related attack vector, but nothing you’ve posted even comes close to being such a thing. (Likewise, that “HTML5 Security Cheatsheet” by Mario that you referred to also fails to explain how anything he lists is a security exploit, as he’s just listed features that allow script execution by design.)

    A real attack vector requires both a way to execute script in a browser under conditions that would not normally allow scripts to execute, or which would give escalated privileges to a script. Sure some of your examples allow execution, but it’s not really a vector unless you’re able to deploy such markup onto some site where you would not normally have the right to deploy scripts.

  3. Gareth Heyes writes:

    @Lachlan

    I point out how the parsers incorrectly parse HTML, they are obviously vectors because in no world should HTML allow to execute script without a closing “>” in the opening tag. I fail to see how relevant it is to talk about a beta browser especially when it isn’t due for release any time soon.

    Anyway this is a boring conversation, I post what I like to do and if you don’t like it don’t read it. It’s fun to find things that the browsers didn’t intend to do and that’s it, it’s not fun to talk about criteria and write a 10 page report on valid/invalid vectors.

  4. uninformed writes:

    your post rocks, thanks for sharing. Lachlan is a dud.

  5. infinity writes:

    Gareth Heyes, you rock! This is highly interesting experimental work.