htmlentities is badly designed
Monday, 26 November 2007
When someone uses htmlentities I’ve seen it time and time again that they expect that it filters variables from all XSS. This is wrong of course because the function requires a second parameter ENT_QUOTES which correctly replaces quote characters. Some developers aren’t even aware that quotes can lead to XSS injection.
This leads me to my point, by default htmlentities should filter quotes and if the developer wishes to turn this functionality off they can using the second parameter.
Here’s the code example for anyone using htmlenitites:-
<?php
htmlentities($variable, ENT_QUOTES);
?>
In the past I’ve also made this mistake by assuming that the function takes quotes into account, not now though I’ve learned π
No. 1 — November 26th, 2007 at 10:53 am
You can also provide a third parameter to enforce a specific type for encoding, making sure that the data matches settings in you environment. Although this is a minor issue, I provide it here as best-practice.
Cheers,
MvD
No. 2 — November 26th, 2007 at 10:54 am
Yep very good point thanks π
No. 3 — November 26th, 2007 at 11:12 am
Hrm, I don’t know whether or not it should encode quotes by default. It’s a function for converting encoding-specific characters to html equivalents so they can be stored anywhere, rather than a tool to avoid malicious input (we have functions for those).
But then, I could argue it both ways. Pros and cons… which is, I suppose, why it’s an option…:p
PS Google seems to be doing a great job in providing ‘related ads’ in your sidebar… π
No. 4 — November 26th, 2007 at 11:27 am
Well judging with the amount of code I’ve seen that’s vulnerable because it isn’t enabled by default, I’d suggest it is. Just my opinion on the matter though π
htmlspecialchars also suffers the same problem, I think it’s a problem when a developer assumes function behavior.
Hehe I don’t care much about the google ads π as you might tell, nobody clicks em anyway. They’re just there to potentially earn beer money…it hasn’t happened yet π
No. 5 — November 26th, 2007 at 11:34 am
Don’t use htmlentities. Use htmlspecialchars instead. htmlentities converts everything that does not falls into the ASCII range into an entity, which may not be a desired effect.
htmlspecialchars concentrate on the basic special characters (<, >, &, “, ‘). By defaut, it escapes double-quotes, but not single quotes.
Now, about the XSS threat, not escaping single-quotes doesn’t really matters, provided that :
– you output content in an html element (as PCDATA). Single or double-quotes don’t need to be escaped then
– you output content in an attribute value, delimited by double-quotes. Then you just need to escape double-quotes, which it does by default.
No. 6 — November 26th, 2007 at 12:00 pm
<?php
$input = ‘\-\mo\z\-b\i\nd\in\g:\url(//business\i\nfo.co.uk\/labs\/xbl\/xbl\.xml\#xss)’;
$input = htmlspecialchars($input, ENT_QUOTES);
?>
<div style=”<?php echo $input?>”></div>
No. 7 — November 26th, 2007 at 1:30 pm
The function can not do all things for you π
how about style,js,event……
a example
<?php
$input = htmlspecialchars($_GET[url], ENT_QUOTES);
?>
<img src=”<?php echo $input?>”>
– –
No. 8 — November 26th, 2007 at 1:33 pm
Yep that’s the point I was trying to make in my last comment π
No. 9 — November 26th, 2007 at 10:25 pm
I looked into the htmlentities implementation a while back and its implemented pretty inefficiently so you got to watch out you will be calling it thousands of times in a loop or encoding the same input multiple times. htmlspecialchars is probably much faster.
No. 10 — November 26th, 2007 at 10:33 pm
Good information to know Felix thanks
No. 11 — November 26th, 2007 at 11:56 pm
Smarter people than me have suggested that one should also pass the $charset param to htmlentities or htmlspecialchars.
http://shiflett.org/blog/2007/may/character-encoding-and-xss
I agree with the supposition that the PHP escaping functions require too much work to be “safe,” though.
No. 12 — November 27th, 2007 at 12:01 am
And that was already in the first comment. Sorry about that… long day. 8)
No. 13 — November 27th, 2007 at 3:38 am
if I use htmlentities just to decode specialchars like ΓΒ‘ΓΒ©ΓΒΓΒ³ΓΒΊΓΒ±Γβ, i wonder if its wrong.
What would you use instead ?
No. 14 — November 27th, 2007 at 11:58 am
I don’t think we want to be using any of these functions in a day-2-day fashion. We should be rolling these functions into platform level easier-to-user functions that all programmers on our teams must use. Drupal security is poor at best overall, but I like their direction of their php input validation functions: http://api.drupal.org/?q=api/group/validation/5 and the like.
No. 15 — November 27th, 2007 at 12:12 pm
Sure whitelist filters and the like would be a better approach. Still my main point was the misunderstanding of htmlenitites and htmlspecialchars or any other function which requires the second parameter to escape quotes.
Many developers think that this is being done and it clearly isn’t.
No. 16 — November 27th, 2007 at 2:41 pm
As Michelangelo van Dam and Ed Finkler pointed out, you should also specify the charset. But specifying it does not help, as you first need to enforce the input charset to get rid of UTF-7 attacks and stuff. Something like this would work:
$input = iconv($_GET[‘input’], ‘UTF-8’, ‘UTF-8’);
$input = htmlentities($input, ENT_QUOTES, $input);
No. 17 — November 29th, 2007 at 8:49 pm
Manual… π
No. 18 — March 19th, 2009 at 3:26 pm
Great catch. Just did a find and replace in my code on 72 instances.
No. 19 — October 14th, 2010 at 2:42 pm
By specifying UYF-8 you are exposing your website to an overflow vulnerability. :D:D funny. You have to upgrade to the newest version of PHP to prevent this.