htmlentities is badly designed

When someone uses htmlentities I’ve seen it time and time again that they expect that it filters variables from all XSS. This is wrong of course because the function requires a second parameter ENT_QUOTES which correctly replaces quote characters. Some developers aren’t even aware that quotes can lead to XSS injection.

This leads me to my point, by default htmlentities should filter quotes and if the developer wishes to turn this functionality off they can using the second parameter.

Here’s the code example for anyone using htmlenitites:-

<?php
htmlentities($variable, ENT_QUOTES);
?>

In the past I’ve also made this mistake by assuming that the function takes quotes into account, not now though I’ve learned :)

Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • del.icio.us
  • Slashdot
  • StumbleUpon

Comments 17

  1. Michelangelo van Dam wrote:

    You can also provide a third parameter to enforce a specific type for encoding, making sure that the data matches settings in you environment. Although this is a minor issue, I provide it here as best-practice.

    Cheers,

    MvD

    Posted 26 Nov 2007 at 10:53 am
  2. Gareth Heyes wrote:

    Yep very good point thanks :)

    Posted 26 Nov 2007 at 10:54 am
  3. David wrote:

    Hrm, I don’t know whether or not it should encode quotes by default. It’s a function for converting encoding-specific characters to html equivalents so they can be stored anywhere, rather than a tool to avoid malicious input (we have functions for those).

    But then, I could argue it both ways. Pros and cons… which is, I suppose, why it’s an option…:p

    PS Google seems to be doing a great job in providing ‘related ads’ in your sidebar… :)

    Posted 26 Nov 2007 at 11:12 am
  4. Gareth Heyes wrote:

    Well judging with the amount of code I’ve seen that’s vulnerable because it isn’t enabled by default, I’d suggest it is. Just my opinion on the matter though :)

    htmlspecialchars also suffers the same problem, I think it’s a problem when a developer assumes function behavior.

    Hehe I don’t care much about the google ads :) as you might tell, nobody clicks em anyway. They’re just there to potentially earn beer money…it hasn’t happened yet :)

    Posted 26 Nov 2007 at 11:27 am
  5. FlorentG wrote:

    Don’t use htmlentities. Use htmlspecialchars instead. htmlentities converts everything that does not falls into the ASCII range into an entity, which may not be a desired effect.

    htmlspecialchars concentrate on the basic special characters (<, >, &, “, ‘). By defaut, it escapes double-quotes, but not single quotes.

    Now, about the XSS threat, not escaping single-quotes doesn’t really matters, provided that :
    - you output content in an html element (as PCDATA). Single or double-quotes don’t need to be escaped then
    - you output content in an attribute value, delimited by double-quotes. Then you just need to escape double-quotes, which it does by default.

    Posted 26 Nov 2007 at 11:34 am
  6. Gareth Heyes wrote:

    <?php
    $input = ‘\-\mo\z\-b\i\nd\in\g:\url(//business\i\nfo.co.uk\/labs\/xbl\/xbl\.xml\#xss)’;
    $input = htmlspecialchars($input, ENT_QUOTES);
    ?>
    <div style=”<?php echo $input?>”></div>

    Posted 26 Nov 2007 at 12:00 pm
  7. Loveshell wrote:

    The function can not do all things for you :)
    how about style,js,event……

    a example

    <?php
    $input = htmlspecialchars($_GET[url], ENT_QUOTES);
    ?>
    <img src=”<?php echo $input?>”>

    - -

    Posted 26 Nov 2007 at 1:30 pm
  8. Gareth Heyes wrote:

    Yep that’s the point I was trying to make in my last comment ;)

    Posted 26 Nov 2007 at 1:33 pm
  9. Felix Zaslavskiy wrote:

    I looked into the htmlentities implementation a while back and its implemented pretty inefficiently so you got to watch out you will be calling it thousands of times in a loop or encoding the same input multiple times. htmlspecialchars is probably much faster.

    Posted 26 Nov 2007 at 10:25 pm
  10. Gareth Heyes wrote:

    Good information to know Felix thanks

    Posted 26 Nov 2007 at 10:33 pm
  11. Ed Finkler wrote:

    Smarter people than me have suggested that one should also pass the $charset param to htmlentities or htmlspecialchars.

    http://shiflett.org/blog/2007/may/character-encoding-and-xss

    I agree with the supposition that the PHP escaping functions require too much work to be “safe,” though.

    Posted 26 Nov 2007 at 11:56 pm
  12. Ed Finkler wrote:

    And that was already in the first comment. Sorry about that… long day. 8)

    Posted 27 Nov 2007 at 12:01 am
  13. phpnewuser wrote:

    if I use htmlentities just to decode specialchars like áéíóúñÑ, i wonder if its wrong.

    What would you use instead ?

    Posted 27 Nov 2007 at 3:38 am
  14. Jim Manico wrote:

    I don’t think we want to be using any of these functions in a day-2-day fashion. We should be rolling these functions into platform level easier-to-user functions that all programmers on our teams must use. Drupal security is poor at best overall, but I like their direction of their php input validation functions: http://api.drupal.org/?q=api/group/validation/5 and the like.

    Posted 27 Nov 2007 at 11:58 am
  15. Gareth Heyes wrote:

    Sure whitelist filters and the like would be a better approach. Still my main point was the misunderstanding of htmlenitites and htmlspecialchars or any other function which requires the second parameter to escape quotes.

    Many developers think that this is being done and it clearly isn’t.

    Posted 27 Nov 2007 at 12:12 pm
  16. Lars Strojny wrote:

    As Michelangelo van Dam and Ed Finkler pointed out, you should also specify the charset. But specifying it does not help, as you first need to enforce the input charset to get rid of UTF-7 attacks and stuff. Something like this would work:
    $input = iconv($_GET['input'], ‘UTF-8′, ‘UTF-8′);
    $input = htmlentities($input, ENT_QUOTES, $input);

    Posted 27 Nov 2007 at 2:41 pm
  17. open source wrote:

    Manual… ;)

    Posted 29 Nov 2007 at 8:49 pm

Post a Comment

Your email is never published nor shared. Required fields are marked *

Comment spam protected by SpamBam