PHP Parser - Filtering Cross Site Scripting (XSS)

published on September 18th, 2008 · more from the blog →

So the last few days I've been seriously stressing about the implications of XSS (Cross site scripting) in a project that I've been working on. If you don't know what XSS is all about and you're a web developer, you're in trouble, google it.

There's also a great website over at http://ha.ckers.org/xss.html that gives you a huge list of many of the known XSS methods.

There are a plethora of PHP Classes out there that work on forums and such with a limited subset of XHTML but I need to cover as much as possible, and before people start shouting at me, an approach using BBCode or Textile just isn't possible here. (and it's ugly, don't get me started)

Whilst trying to find a decent PHP function to parse out these threats in the simplest manner possible I ended up combining a few to come up with what's below.

Download file (strip_xss.txt)

function strip_xss($str, $allowed=null){
	if (!$allowed){
		$allowed = array('<h1>','<h2>','<h3>','<h4>','<h5>','<h6>','<b>','<i>','<u>','<a>','<ul>','<ol>','<li>','<pre>','<hr>','<blockquote>','<img>','<font>','<span>','','
','<table>','<thead>','<th>','<tr>','<td>','<em>','<strong>','<applet>','<div>','<center>','<pre>','<ins>','<del>','<em>','<kbd>','<dd>','<tbody>','<tfooter>','<big>','<button>','<input>','<option>','<textarea>','<fieldset>','<form>','<legend>','code');
	}
	$disabled = array('onabort', 'onactivate', 'onafterprint', 'onafterupdate', 'onbeforeactivate', 'onbeforecopy', 'onbeforecut', 'onbeforedeactivate', 'onbeforeeditfocus', 'onbeforepaste', 'onbeforeprint', 'onbeforeunload', 'onbeforeupdate', 'onblur', 'onbounce', 'oncellchange', 'onchange', 'onclick', 'oncontextmenu', 'oncontrolselect', 'oncopy', 'oncut', 'ondataavaible', 'ondatasetchanged', 'ondatasetcomplete', 'ondblclick', 'ondeactivate', 'ondrag', 'ondragdrop', 'ondragend', 'ondragenter', 'ondragleave', 'ondragover', 'ondragstart', 'ondrop', 'onerror', 'onerrorupdate', 'onfilterupdate', 'onfinish', 'onfocus', 'onfocusin', 'onfocusout', 'onhelp', 'onkeydown', 'onkeypress', 'onkeyup', 'onlayoutcomplete', 'onload', 'onlosecapture', 'onmousedown', 'onmouseenter', 'onmouseleave', 'onmousemove', 'onmoveout', 'onmouseover', 'onmouseup', 'onmousewheel', 'onmove', 'onmoveend', 'onmovestart', 'onpaste', 'onpropertychange', 'onreadystatechange', 'onreset', 'onresize', 'onresizeend', 'onresizestart', 'onrowexit', 'onrowsdelete', 'onrowsinserted', 'onscroll', 'onselect', 'onselectionchange', 'onselectstart', 'onstart', 'onstop', 'onsubmit', 'onunload');


	// remove all non-printable characters. CR(0a) and LF(0b) and TAB(9) are allowed // this prevents some character re-spacing such as <java\0script> // note that you have to handle splits with \n, \r, and \t later since they *are* allowed in some inputs
	$str = preg_replace('/([\x00-\x08,\x0b-\x0c,\x0e-\x19])/', '', $str);


	// straight replacements, the user should never need these since they're normal characters
	// this prevents like <IMG SRC=&#X40&#X61&#X76&#X61&#X73&#X63&#X72&#X69&#X70&#X74&#X3A&#X61&#X6C&#X65&#X72&#X74&#X28&#X27&#X58&#X53&#X53&#X27&#X29>
	$search = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890!@#$%^&*()~`";:?+/={}[]-_|\'\\';
	for ($i = 0; $i < strlen($search); $i++) {
		// ;? matches the ;, which is optional // 0{0,7} matches any padded zeros, which are optional and go up to 8 chars // &#x0040 @ search for the hex values
		$str = preg_replace('/(&#[xX]0{0,8}'.dechex(ord($search[$i])).';?)/i', $search[$i], $str); // with a ;
		// &#00064 @ 0{0,7} matches '0' zero to seven times
		$str = preg_replace('/(&#0{0,8}'.ord($search[$i]).';?)/', $search[$i], $str); // with a ;
	}


	return preg_replace('/\s(' . implode('|', $disabled) . ').*?([\s\>])/', '\\2', preg_replace('/<(.*?)>/ie', "'<' . preg_replace(array('/javascript:[^\"\']*/i', '/(" . implode('|', $disabled) . ")[ \\t\\n]*=[ \\t\\n]*[\"\'][^\"\']*[\"\']/i', '/\s+/'), array('', '', ' '), stripslashes('\\1')) . '>'", strip_tags($str, implode('', $allowed))) );
}
Download file (strip_xss.txt)

What I'm yet to come up with is a way of stopping people putting in things such as..

<img src="http://yoursite.com/admin/users/deleteall" />
Then whenever an admin or someone went to this page, alredy logged in to the app, the page would be executed as them, perfectly legally. Obviously there isn't a page that does delete all users, but you can see the problem, right.

Anybody who finds an improvement / bug, please please please add it back here so everyone can benefit, i'll update the code as we go!