htmlentities is badly designed

When someone uses htmlentities I’ve seen it time and time again that they expect that it filters variables from all XSS. This is wrong of course because the function requires a second parameter ENT_QUOTES which correctly replaces quote characters. Some developers aren’t even aware that quotes can lead to XSS injection.

This leads me to my point, by default htmlentities should filter quotes and if the developer wishes to turn this functionality off they can using the second parameter.

Here’s the code example for anyone using htmlenitites:-

<?php
htmlentities($variable, ENT_QUOTES);
?>

In the past I’ve also made this mistake by assuming that the function takes quotes into account, not now though I’ve learned πŸ™‚

19 Responses to “htmlentities is badly designed”

  1. Michelangelo van Dam writes:

    You can also provide a third parameter to enforce a specific type for encoding, making sure that the data matches settings in you environment. Although this is a minor issue, I provide it here as best-practice.

    Cheers,

    MvD

  2. Gareth Heyes writes:

    Yep very good point thanks πŸ™‚

  3. David writes:

    Hrm, I don’t know whether or not it should encode quotes by default. It’s a function for converting encoding-specific characters to html equivalents so they can be stored anywhere, rather than a tool to avoid malicious input (we have functions for those).

    But then, I could argue it both ways. Pros and cons… which is, I suppose, why it’s an option…:p

    PS Google seems to be doing a great job in providing ‘related ads’ in your sidebar… πŸ™‚

  4. Gareth Heyes writes:

    Well judging with the amount of code I’ve seen that’s vulnerable because it isn’t enabled by default, I’d suggest it is. Just my opinion on the matter though πŸ™‚

    htmlspecialchars also suffers the same problem, I think it’s a problem when a developer assumes function behavior.

    Hehe I don’t care much about the google ads πŸ™‚ as you might tell, nobody clicks em anyway. They’re just there to potentially earn beer money…it hasn’t happened yet πŸ™‚

  5. FlorentG writes:

    Don’t use htmlentities. Use htmlspecialchars instead. htmlentities converts everything that does not falls into the ASCII range into an entity, which may not be a desired effect.

    htmlspecialchars concentrate on the basic special characters (<, >, &, “, ‘). By defaut, it escapes double-quotes, but not single quotes.

    Now, about the XSS threat, not escaping single-quotes doesn’t really matters, provided that :
    – you output content in an html element (as PCDATA). Single or double-quotes don’t need to be escaped then
    – you output content in an attribute value, delimited by double-quotes. Then you just need to escape double-quotes, which it does by default.

  6. Gareth Heyes writes:

    <?php
    $input = ‘\-\mo\z\-b\i\nd\in\g:\url(//business\i\nfo.co.uk\/labs\/xbl\/xbl\.xml\#xss)’;
    $input = htmlspecialchars($input, ENT_QUOTES);
    ?>
    <div style=”<?php echo $input?>”></div>

  7. Loveshell writes:

    The function can not do all things for you πŸ™‚
    how about style,js,event……

    a example

    <?php
    $input = htmlspecialchars($_GET[url], ENT_QUOTES);
    ?>
    <img src=”<?php echo $input?>”>

    – –

  8. Gareth Heyes writes:

    Yep that’s the point I was trying to make in my last comment πŸ˜‰

  9. Felix Zaslavskiy writes:

    I looked into the htmlentities implementation a while back and its implemented pretty inefficiently so you got to watch out you will be calling it thousands of times in a loop or encoding the same input multiple times. htmlspecialchars is probably much faster.

  10. Gareth Heyes writes:

    Good information to know Felix thanks

  11. Ed Finkler writes:

    Smarter people than me have suggested that one should also pass the $charset param to htmlentities or htmlspecialchars.

    http://shiflett.org/blog/2007/may/character-encoding-and-xss

    I agree with the supposition that the PHP escaping functions require too much work to be “safe,” though.

  12. Ed Finkler writes:

    And that was already in the first comment. Sorry about that… long day. 8)

  13. phpnewuser writes:

    if I use htmlentities just to decode specialchars like ΓƒΒ‘ΓƒΒ©ΓƒΒ­ΓƒΒ³ΓƒΒΊΓƒΒ±Γƒβ€˜, i wonder if its wrong.

    What would you use instead ?

  14. Jim Manico writes:

    I don’t think we want to be using any of these functions in a day-2-day fashion. We should be rolling these functions into platform level easier-to-user functions that all programmers on our teams must use. Drupal security is poor at best overall, but I like their direction of their php input validation functions: http://api.drupal.org/?q=api/group/validation/5 and the like.

  15. Gareth Heyes writes:

    Sure whitelist filters and the like would be a better approach. Still my main point was the misunderstanding of htmlenitites and htmlspecialchars or any other function which requires the second parameter to escape quotes.

    Many developers think that this is being done and it clearly isn’t.

  16. Lars Strojny writes:

    As Michelangelo van Dam and Ed Finkler pointed out, you should also specify the charset. But specifying it does not help, as you first need to enforce the input charset to get rid of UTF-7 attacks and stuff. Something like this would work:
    $input = iconv($_GET[‘input’], ‘UTF-8’, ‘UTF-8’);
    $input = htmlentities($input, ENT_QUOTES, $input);

  17. open source writes:

    Manual… πŸ˜‰

  18. Jeremy Glover writes:

    Great catch. Just did a find and replace in my code on 72 instances.

  19. Mtutnid writes:

    By specifying UYF-8 you are exposing your website to an overflow vulnerability. :D:D funny. You have to upgrade to the newest version of PHP to prevent this.