HTML Escaping for Secure Web Pages

owasp_logoCross Site Scripting (XSS) is a well known challenge for web sites to protect against.  The Open Web Application Security Project (OWASP) has great resources such as the XSS Prevention Cheat Sheet that is a worthwhile read for anyone wanting to understand the issues deeper.  I came across the issue again in a discussion of whether Magento should continue to use PHTML or consider a templating engine such as TWIG.  Magento, being web store software dealing with real money, needs to be particular aware of security.

In a nutshell, when using PHTML doing

    <?php echo $x; ?>

could be a security risk.  If $x holds an integer, there is no risk.  The problem arises when $x holds a string that contains HTML sensitive characters such as “<“, “>”, and “&”.  (In HTML attributes, quote characters are also sensitive.)  To be safe, characters such as “<” should be escaped as “&lt;” or else the web browser will interpret it as markup.

For example, consider user product reviews.  If a Magento module displayed the user review on a web page without escaping sensitive characters, a user A could type raw HTML into a user review (including HTML elements containing JavaScript code).  Another user B who then read that review would execute that JavaScript code.  The JavaScript would be running logged on as user B and so may be able to spend user B’s money without them realizing.

For some reason users typically don’t like their money being spent without their approval.

So how to protect against this in PHP?  There are several strategies.  I will mention only two here.

  1. Use the PHP htmlspecialchars() function to escape sensitive characters.  Echoing $x above would become
    <?php echo htmlspecialchars($x, ENT_XHTML | ENT_QUOTES); ?>
  2. Use a templating language such as TWIG which makes it quick and easy to escape characters.
    {{ x|e }}

The first scheme requires programming discipline.  Its just so much extra text to type in!  For any string to be injected into a HTML page, if the string is not already marked up as HTML, then it should be escaped using a function such as htmlspecialchars().

twigThe second scheme using TWIG requires less discipline as only “|e” had to be added (or “|escape” if you like the verbose form).  This is safer from a security point of view.  (It would have been better if escaping was the default and you had to disable it.)

In practice, most injection of variables into a web page is safe as the content comes from developers or from Merchant developed content held in databases.  The risk comes when the input of external users can be injected into a page, such as from product reviews.

Disclaimer: It is not my purpose in this post to comment on which approach I think Magento should use.  I am just writing to expand upon the security aspects which I got asked about in the GitHub thread.

3 comments

  1. tobiaszander · · Reply

    Nice post, I came to here from the mage2/twig thread. But you should also add the charset to your htmlspecialchars call.

    People should also be aware of that json_encode is not XSS-safe and has to be excaped, too.

  2. When I read the htmlspecialchars() documentation, it said the default encoding was UTF-8 (from PHP 5.4 onwards, which is the base version for Magento 2), which is what XHTML requires, so I believe that argument can now be omitted. So you are right that I left it out, but I think that is safe to do in Magento 2. (And it made the example shorter!)

    Thanks for the note on json_encode.

  3. One more from Mage2/twig thread. Completely agree with your point. On the other hand we can also use ” HTML Purifier – Standards-Compliant HTML Filtering” in the template files to remove any malicious code. The filters (filter_var) and htmlspecialchars() are obligatory in our request handlers rather in the template files, its my personal opinion though.

    I personally believe that the existing template files .phtml can get rid of bad HTML & JS inputs, if we use http://htmlpurifier.org/. BTW Thanks a lot Alan for the nice article !

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: