Blog fight round two

Thanks Pádraic

So I hope you’ve enjoyed our blog fight between me and Pádraic Brady. I sense a lack of a sense if humour in his last post 🙁 his blanket claims that regex html validation sucks were obviously unjustified. Anyway I was waiting for a cool XSS hole in HTMLReg from him, it never came 🙁 he did raise a valid point about a “clickjacking” threat so I decided to update HTMLReg/CSSReg to enable a site to disable all CSS positioning. Thanks very much Pádraic for reporting this issue!

Disable Positioning

HTMLReg and CSSReg now have the option “disablePositioning” this will stop you from be able to define elements which overlap each other, which is useful in a validation scenario not a iframe sandbox scenario (which HTMLReg was originally intended). It’s quite easy to use:-

HTMLReg.disablePositioning = true;
alert(HTMLReg.parse("<div style=position:absolute;></div>")); // <div></div>

Validate HTML

I didn’t stop there while I had my IDE open, I decided to add a validate HTML option, using the new “DOMParser” feature. As I looked deeper into the feature I wish I hadn’t bothered 🙁 when a invalid XML markup is encountered IE throws exception. FF, Op, Webkit doesn’t. Webkit transforms XML and adds a parsing error node/FF entitiy encodes. Ugh. So anyway hopefully I made a cross browser solution which will prevent any invalid HTML markup. Anyway this is what I came up with that should work on newer browsers and older versions of IE:

StringtoXML = function (text){		
	try {
		if(window.DOMParser) {
		  var parser=new DOMParser();
	      var doc=parser.parseFromString(text,'text/xml');		      
	      var xml = (new XMLSerializer()).serializeToString(doc);
	      xml = xml.replace(/^<\?[^?]+\?>\s*/,'');		      
	      if(/<parsererror[^>]+>/.test(xml)) {
	    	  return 'Invalid HTML markup';
	      } else {
	    	  return xml;
		} else if(window.ActiveXObject){
          var doc=new ActiveXObject('Microsoft.XMLDOM');
          if(!doc.xml) {
        	  throw {};
          return doc.xml;
		} else {
			return text;
	} catch(e) {
		return 'Invalid HTML markup';

Thanks Paul Stone

An excellent bug was found by Paul Stone as a result of this blog fight 🙂 In Firefox RC4 & latest Chrome he noticed that HTMLReg was allowing invalid CSS markup, as he quite rightly pointed out HTMLReg was checking if cssText contained something in RC4 but it wasn’t passing the check as a result the invalid CSS was being allowed. The fix for this was quite simple and involved removing the style attribute if the cssText check wasn’t passed. This didn’t create a security breach as the invalid CSS was just that invalid and other browsers such as IE didn’t allow this markup to be executed but it was a cool bug since HTMLReg should not allow this invalid string.

The vector looked like this:-

<div style="xxx">test</div>

The fix was simply to do this:-

if(element.getAttribute("style") !== '' && element.getAttribute("style") !== null && !== '') {
} else {
//drop style attribute if it exists = null;

HTMLReg rocky is waiting

HTMLReg is in the ring waiting for you, please get in your corner and try your luck. I doubt you can win 🙂

10 Responses to “Blog fight round two”

  1. Pádraic Brady writes:

    Ahem, my article was PHP related…again ;). I figured the missed CSS problem was more important than digging around for XSS vulns. Also, it took far less time ;).

  2. Gareth Heyes writes:


    The language does not matter, client side is better as you can use the DOM but I still think I could recreate HTMLReg in PHP.

  3. Makarroni writes:


    Putting some much pride and energy defending your little fancy regex-based solution is so much pointless… Pádraic is right, no matter whether you implement it in JavaScript, PHP, or Swahili.
    You’ll discover it sooner or later…

  4. Gareth Heyes writes:


    So the basis for your argument is that I’ll find out. Ok I’m convinced.

  5. Padraic Brady writes:

    You do realise this “blog war” is about using regular expressions and not the DOM? I have zero arguments against using the DOM. Zero. I’m not sure how many times I need to reiterate that number. Zero. There, said it again ;). Zero. And again. Is it getting through yet?

    This continued lack of precision is certainly shaping up a “blog war” over what I have and have not stated and what I intended saying versus what you seem to have imagined I intended saying.

    My original article called out regular expression based HTML Sanitisers in PHP which…gasp…do NOT use a DOM. HTMLReg DOES use a DOM. See the difference? In PHP, regular expression based sanitisers are thoroughly broken, low quality, delusional products which do not sanitise HTML effectively. This is based on current reality. I examined all of the prominent libraries and then some before arriving at my conclusions.

    If you want to win this “blog war”, by all means try doing so. Simply refute my claims within the context of the original article. Otherwise, you are simply attacking the article using the exact thing I am not arguing against. It’s contradictory, confusing and uncalled for.

  6. Gareth Heyes writes:


    I do not agree with your statements in your article
    “Regex based HTML Sanitisers do have one teeny tiny little wrinkle. They don’t work.”

    Maybe the ones you tested don’t work but I think it is possible to write one in PHP that would.It won’t be as good as one that relies on the DOM but I still think it’s possible.

  7. Padraic Brady writes:

    And you’re more than welcome to prove your point by writing one or pointing out such an example. The reason I’m hammering these points so hard is that none currently exists that I am aware of, and I do not want readers to get the impression that this has been accomplished in PHP when it obviously hasn’t. My advice to avoid regex santisers in PHP was the core message of my original article and I sincerely believe it will hold true for a very long time. Having that obscured by a sideline argument on theoretical possibilities is something I want to avoid.

  8. charlie writes:

    I don’t get it, it’s a javascript browser based cleaner? What if I just disable javascript, or remove the javscript or events through firebug?

  9. Gareth Heyes writes:


    Yeah it’s javascript based, if you disable javascript then your html won’t render. If you disable events through firebug then those events won’t work for you

  10. Rob writes:

    Gotta be honest, sounds like you completely missed the point of the original article and now you’ve dug yourself a hole that you can’t/won’t wriggle out of. Nobody’s saying the point you’re arguing is wrong, it’s just not the point that’s being argued about! You just have to admit you got the wrong end of the stick and you can get on with your life…it’s either that or you’re looking for the fight in which case you’re just an idiot.