Delicious Bookmark this on Delicious Share on Facebook SlashdotSlashdot It! Digg! Digg



PHP : Function Reference : String Functions : htmlspecialchars

htmlspecialchars

Convert special characters to HTML entities (PHP 4, PHP 5)
string htmlspecialchars ( string string [, int quote_style [, string charset [, bool double_encode]]] )

Example 2415. htmlspecialchars() example

<?php
$new
= htmlspecialchars("<a href='test'>Test</a>", ENT_QUOTES);
echo
$new; // &lt;a href=&#039;test&#039;&gt;Test&lt;/a&gt;
?>

Related Examples ( Source code ) » phpspecialchars



Code Examples / Notes » phpspecialchars

25-jun-2005 04:44

You can't use htmlspecialchars to create RSS feeds, since it expands ampersands.You need to use something like this:
$content = preg_replace(array('/</', '/>/', '/"/'), array('&lt;', '&gt;', '&quot;'), $content);


alex-0

You can also use variables.
This is handy when working with forms to clear out an malicious html
<?php
$new = htmlspecialchars($_POST[message], ENT_QUOTES);
echo $new;
?>


alexander nofftz

Why &#39;? The HTML and XML DTDs proposed &apos; for this.
See http://www.w3.org/TR/html/dtds.html#a_dtd_Special_characters
So better use this:
$text = htmlspecialchars($text, ENT_QUOTES);
$text = preg_replace('/&#0*39;/', '&apos;', $text);


donwilson

To reverse the action of htmlspecialchars(), use this code:
<?php
unhtmlspecialchars( $string )
{
$string = str_replace ( '&amp;', '&', $string );
$string = str_replace ( '&#039;', '\'', $string );
$string = str_replace ( '&quot;', '\"', $string );
$string = str_replace ( '&lt;', '<', $string );
$string = str_replace ( '&gt;', '>', $string );

return $string;
}
?>


david

To replace the swedish characters...
               $s=ereg_replace(197, "&Aring;",$s);
$s=ereg_replace(196, "&Auml;",$s);
$s=ereg_replace(214, "&Ouml;",$s);
$s=ereg_replace(229, "&aring;",$s);
$s=ereg_replace(228, "&auml;",$s);
$s=ereg_replace(246, "&ouml;",$s);


terminatorul

To html-encode Unicode characters that may not be part of your document character set (given in the META tag of your page), and so can not be output directly into your document source, you need to use mb_encode_numericentity(). Pay attention to it's conversion map argument.

thorax

to convert a document back from this,
do string replacements in this order:
>   >
<   <
" "
&  &
Doing the last phase first will
reveal erroneous results.. For example:
'<'  => specialchars() => '&lt;' '&lt;' => convert ampersands => '<' => convert everything else => '<'


palrich

To Alexander Nofftz and urbanheroes:
It's not an IE problem.  There is no &apos; in HTML.  So it's only a problem if someone else does render this as an apostraphe on an HTML page.


galvao

There's a tiny error on alex-0 at hotmail dot co dot uk example:
The line:
$new = htmlspecialchars($_POST[message], ENT_QUOTES);
Should be written as:
$new = htmlspecialchars($_POST['message'], ENT_QUOTES);
Regards,


mlvanbie

The code in the previous note has a bug.  If the original text was `&gt;' then htmlspecialchars will turn it into `&amp;gt;' and the suggested code will turn that into `>'.  The &amp; translation must be last.

took

The Algo from donwilson at gmail dot com to reverse the action of htmlspecialchars(), edited for germany:
function unhtmlspecialchars( $string )
{
 $string = str_replace ( '&amp;', '&', $string );
 $string = str_replace ( '&#039;', '\'', $string );
 $string = str_replace ( '&quot;', '"', $string );
 $string = str_replace ( '&lt;', '<', $string );
 $string = str_replace ( '&gt;', '>', $string );
 $string = str_replace ( '&uuml;', 'ü', $string );
 $string = str_replace ( '&Uuml;', 'Ü', $string );
 $string = str_replace ( '&auml;', 'ä', $string );
 $string = str_replace ( '&Auml;', 'Ä', $string );
 $string = str_replace ( '&ouml;', 'ö', $string );
 $string = str_replace ( '&Ouml;', 'Ö', $string );    
 return $string;
}


dystopia589

Sorry, part of that code was unnecessary. Here's a more readable version:
function SpecialChars($Security)
{
if (is_array($Security))
{
while(list($key, $val) = each($Security))
{
$Security[$key] = SpecialChars($val);
}
}
else
{
$Security = htmlspecialchars(stripslashes($Security), ENT_QUOTES);
}
return $Security;
}


beer undrscr nomaed

Quite often, on HTML pages that are not encoded as UTF-8, and people write in not native encoding, some browser (for sure IExplorer) will send the different charset characters using HTML Entities, such as &#1073; for small russian 'b'.
htmlspecialchars() will convert this character to the entity, since it changes all & to &amp;
What I usually do, is either turn &amp; back to & so the correct characters will appear in the output, or I use some regex to replace all entities of characters back to their original entity:
<?php
   // treat this as pseudo-code, it hasn't been tested...
   $result = preg_replace('/&amp;#(x[a-f0-9]+|[0-9]+);/i', '&#$1;', $source);
?>


_____

People, don't use ereg_replace for the most simple string replacing operations (replacing constant string with another).
Use str_replace.


nospam

most simple function for decoding html-encoded strings:
function htmldecode($encoded) {
return strtr($encoded,array_flip(get_html_translation_table(HTML_ENTITIES)));
}


urbanheroes {at} gmail {dot} com

In response to the note made by Alexander Nofftz on October 2004, &#39; is used instead of &apos; because IE unfortunately seems to have trouble with the latter.

15-jul-2001 07:18

If your sending data from one form to another, the data in the textareas and text inputs may need to have htmlspecialchars("form data", ENT_QUOTES) applied, assuming you will ever have quotes or less-than signs or any of those special characters.  Using htmlspecialchars will make the text show up properly in the second form.  The changes are automatically undone whenever the form data is submitted. It does seem a little strange, but it works and my headache is now starting to go away.
AZ


frank

If you seem to have a problem with rendering dynamic RSS files from a database - try using htmlspecialchars() or htmlentities() on the text you are rendering.
Since XML and RSS is very strict about what is allowed inside nodes, you need to make sure everything is "A-OK" according to XML standards ...
Especially if the database you're pulling data from is fi. Latin-Swedish encoding, which seems to be the standard setting for MySQL databases.


akira dot yoshi

If you need to htmlspecialchars a jis string, here's a function that does:
function htmlspecialchars_jis($text) {
$ret="";
if ($text=="") return "";
$esc=chr(27);
$text=$esc."$B".$esc."$B".$text;
$text=str_replace($esc."(B", $esc."$B", $text);
$trans=explode($esc."$B", $text);
$enc=0;
while (list (, $val) = each ($trans))  {
if ($enc==0) {
$val.="";
if ($val!="") $ret.=htmlspecialchars($val);
$enc=1;
} else {
$val.="";
if ($val!="") $ret.=$esc."$B".$val.$esc."(B";
$enc=0;
};
}
return $ret;
};
BTW: I'm very(!) sure that JIS is iso-2022-jp, not iso-2002-jp


thisiswherejunkgoes

If there're any n00bs out there looking for a way to ensure that no html/special chars are getting sent to their databases/put through forms/etc., this has been doing the trick for me (though being at least slightly n00bish, if this won't always work perhaps someone will ammend :-)
function checkforchars ($foo) {
 if ($foo === htmlspecialchars($foo)) {
       return "Valid entry.";
 } else {
       return "Invalid entry.";
 }
}


juadielon_nospam

I was trying to retrieve information from a database to display it into the browser. However it did not work as I was expecting.  For instance double quotes (“”) and single quotes (‘’) were conflicting in HTML in an INPUT selector.
The first approach to solve this was to use htmlspecialchars to convert special characters to HTML entities to display the input box with its value.
$encode=htmlspecialchars($str, ENT_QUOTES);
However, the result was having HTML entities with a \ (backslash) preceding it (escape characters).  For instance ampersand (&) becomes \&amp; displaying \& and double quotes becomes \&quot; displaying \”
So the final solution was to replace first any \ (backslash) and then ask htmlspecialchars to make the conversion.
[Editor's Note: This is the wrong way to do this. The proper way is to use
$encoded = htmlspecialchars(stripslashes($str), ENT_QUOTES);
]
$encoded=htmlspecialchars(str_replace('\\', '', $str), ENT_QUOTES);
Try this example to see it your self.
<form action="<?php echo $PHP_SELF; ?>">
<input type="text" name="str" size="20" value="">
<input type="submit" value="Submit">

<?php
 if (!empty($str)) {
$encoded=htmlspecialchars(str_replace('\\', '', $str), ENT_QUOTES);
echo "


Result: <b>".$encoded."</b>. It should be the same you just typed";
echo "

But source code is transformed to:<b><xmp>".$encoded."</xmp></b>";
// I know, I know <xmp> is deprecated in HTML 4 but was easy to use this time to display result.
 }
?>
</form>
Hope this will helps someone.


jspalletta

I have found that this regular expression is sufficient for making sure that existing character entities show after htmlspecialchars() replaces _all_ occurrences of & with the &amp; entity.
<?php
// Note: hsc is an abbreviation of htmlspecialchars
function hscFixed($str)
{
return preg_replace("/&amp;(#[0-9]+|[a-z]+);/i", "&$1;", htmlspecialchars($str));
}
?>
The only flaw I can think of is if you have text of the vein; "&[word];", that is not meant to be a character but rather uses the ampersand and semicolon in their traditional grammatical denotations.  However I think this is highly unlikely to occur (among other reasons, the fact that anyone with enough grammatical inclination to use them as such probably won't leave out the space between the ampersand and the word).


richard

I had a script which detected swearing and wanted to make sure that words such as 'f &uuml; c k' didn't slip through the system.
After using htmlentities(), the following line converts most extended alphabet characters back to the standard alphabet so you can spot such problems..
$text=eregi_replace("&([a-z])[a-z0-9]{3,};", "\\\\1", $text);
This changes, for example, '&uuml;' into 'u' and '&szlig' into 's'.  Sadly it also converts '&pound;' and '&para;' into 'p' so it's not perfect but does solve a lot of the problems


mikiwoz

I am not sure, maybe I'm missing something, but I have found something interesting:
I've been working on a project, where I had to use htmlspecialchars (for opbvious reasons). I olso needed to de-code the encoded string. What I have done was almost a copy and paste from php.net:
$trans=get_html_translation_table(HTML_SPECIALCHARS, ENT_QUOTES);
$trans=array_flip($trans);
$string=strtr($encoded, $trans);
(it looked a bit different in my code, but the idea is clear)
I couldn't get the apostrophe sign de-coded, and I needed it for the <A> tags. After an hour or so of debuging, I decided do print_r($trans). What I got was:
...
[&#39;] => '
...
BUT the apostrophe was encoded to $#039; -> note the zero.
I don't suppose it's a bug, but it definetely IS a potential pitfall, watch out for this one.


thelatesundayshow.com @ nathan flip it

heres a version of the recursive escape function that takes the array byref rather than byval so saves some resources in case of big arrays
function recurse_array_HTML_safe(&$arr) {
foreach ($arr as $key => $val)
if (is_array($val))
recurse_array_HTML_safe($arr[$key]);
else
$arr[$key] = htmlspecialchars($val, ENT_QUOTES);
}


mike-php

Here's a handy function that guards against 'double' encoding:
# Given a string, this function first strips out all html special characters, then
# encodes the string, safely returning an encoded string without double-encoding.
function get_htmlspecialchars( $given, $quote_style = ENT_QUOTES ){
  return htmlspecialchars( html_entity_decode( $given, $quote_style ), $quote_style );
}
# Needed for older versions of PHP that do not have this function built-in.
function html_entity_decode( $given_html, $quote_style = ENT_QUOTES ) {
  $trans_table = get_html_translation_table( HTML_SPECIALCHARS, $quote_style );
  if( $trans_table["'"] != '&#039;' ) { # some versions of PHP match single quotes to &#39;
     $trans_table["'"] = '&#039;';
  }
  return ( strtr( $given_html, array_flip( $trans_table ) ) );
}
Note: I set the default to ENT_QUOTES, as this makes more sense to me than the PHP function's default of ENT_COMPAT.


gt

Here is the recursive version that works for both arrays and strings. Doesn't look as elegant as the other recursive versions, because of the input checks.
function HTML_ESC($_input = null, $_esc_keys = false)
{
   if ((null != $_input) && (is_array($_input)))
   {
       foreach($_input as $key => $value)
       {
           if($_esc_keys)
           {
               $_return[htmlspecialchars($key)] = HTML_ESC($value,$_esc_keys);
           }
           else
           {
               $_return[$key] = HTML_ESC($value);
           }
       }
       return $_return;
   }
   elseif(null != $_input)
   {
       return htmlspecialchars($_input);
   }
   else
   {
       return null;
   }
}


joseph

Here is a handy function to htmlalize an array (or scalar) before you hand it off to xml.
function htmlspecialchars_array($arr = array()) {
$rs =  array();
while(list($key,$val) = each($arr)) {
if(is_array($val)) {
$rs[$key] = htmlspecialchars_array($val);
}
else {
$rs[$key] = htmlspecialchars($val, ENT_QUOTES);
}
}
return $rs;
}


dave duchene

Here is a handy function that will escape the contents of a variable, recursing into arrays.
<?php
function escaporize($thing) {
 if (is_array($thing)) {
   $escaped = array();
 
   foreach ($thing as $key => $value) {
     $escaped[$key] = escaporize($value);
   }
   
   return $escaped;
 }
 
 // else
 return htmlspecialchars($thing);
}
?>


moc.xnoitadnuof@310symerej

Here are some usefull functions.
They will apply || decode, htmlspecialchars || htmlentities recursivly to arrays() || to regular $variables. They also protect agains "double encoding".
<?PHP
function htmlspecialchars_or( $mixed, $quote_style = ENT_QUOTES ){
return is_array($mixed) ? array_map('htmlspecialchars_or',$mixed, array_fill(0,count($mixed),$quote_style)) : htmlspecialchars(htmlspecialchars_decode($mixed, $quote_style ),$quote_style);
}
function htmlspecialchars_decode( $mixed, $quote_style = ENT_QUOTES ) {
if(is_array($mixed)){
  return array_map('htmlspecialchars_decode',$mixed, array_fill(0,count($mixed),$quote_style));
 }
 $trans_table = get_html_translation_table( HTML_SPECIALCHARS, $quote_style );
if( $trans_table["'"] != '&#039;' ) { # some versions of PHP match single quotes to &#39;
$trans_table["'"] = '&#039;';
}
return (strtr($mixed, array_flip($trans_table)));
}
function htmlentities_or($mixed, $quote_style = ENT_QUOTES){
return is_array($mixed) ? array_map('htmlentities_or',$mixed, array_fill(0,count($mixed),$quote_style)) : htmlentities(htmlentities_decode($mixed, $quote_style ),$quote_style);
}
function htmlentities_decode( $mixed, $quote_style = ENT_QUOTES ) {
 if(is_array($mixed)){
  return array_map('htmlentities_decode',$mixed, array_fill(0,count($mixed),$quote_style));
 }
$trans_table = get_html_translation_table(HTML_ENTITIES, $quote_style );
if( $trans_table["'"] != '&#039;' ) { # some versions of PHP match single quotes to &#39;
$trans_table["'"] = '&#039;';
}
return (strtr($mixed, array_flip($trans_table)));
}
?>
These functions are an addition to an earlier post. I would like to give the person some credit but I do not know who it was.
<?  ;llnu=u!eJq dHd?>


luiz miguel axcar lmaxcar

Hello,
If you are getting trouble to SGDB write/read HTML data, try to use this:
<?php
//from html_entity_decode() manual page
function unhtmlentities ($string) {
  $trans_tbl =get_html_translation_table (HTML_ENTITIES );
  $trans_tbl =array_flip ($trans_tbl );
  return strtr ($string ,$trans_tbl );
}
//read from db
$content = stripslashes (htmlspecialchars ($field['content']));
//write to db
$content = unhtmlentities (addslashes (trim ($_POST['content'])));
//make sure result of function get_magic_quotes_gpc () == 0, you can get strange slashes in your content adding slashes twice
//better to do this using addslashes
$content = (! get_magic_quotes_gpc ()) ? addslashes ($content) : $content;
?>


paul dot l

function reverse_htmlentities($mixed)
{
$htmltable = get_html_translation_table(HTML_ENTITIES);
foreach($htmltable as $key => $value)
{
$mixed = ereg_replace(addslashes($value),$key,$mixed);
}
return $mixed;
}
this is my version of a reversed htmlentities function


11-mar-2005 12:22

function htmlspecialchars_array($arr = array()) {
  $rs =  array();
  while(list($key,$val) = each($arr)) {
      if(is_array($val)) {
          $rs[$key] = htmlspecialchars_array($val);
      }
      else {
          $rs[$key] = htmlspecialchars($val, ENT_QUOTES);
      }    
  }
  return $rs;
}


webmaster

for those of you using V 4.3.0+ you can use html_entity_decode() to decode a string encoded with htmlspecialschars(), this should be faster and easier then using a str_replace or ereg.

marcel

For HTML to text conversion see example 3 on http://www.php.net/manual/en/function.preg-replace.php

akira

Beware of parsing JIS (aka 'iso-2002-jp') text through this function, as this function does not appear to have a sense for multibyte characters and may corrupt some characters. Eg. the japanese comma (the two ascii characters !" as viewed by an ascii client) gets transferred into !&quot; , which transforms the comma into a 'maru' mark and the following characters into garbage.
Conceivably this could affect other multibyte charsets.


solar-energy

also see function "urlencode()", useful for passing text with ampersand and other special chars through url
(i.e. the text is encoded as if sent from form using GET method)
e.g.
<?php
echo "<a href='foo.php?text=".urlencode("foo?&bar!")."'>link</a>";
?>
produces
<a href='foo.php?text=foo%3F%26bar%21'>link</a>
and if the link is followed, the $_GET["text"] in foo.php will contain "foo?&bar!"


beer undrscr nomaed

After inspecting the non-native encoding problem, I noticed that for example, if the encoding is cyrillic, and I write Latin characters that are not part of the encoding (æ for example - ae-ligature), the browser will send the real entity, such as &aelig; for this case.
Therefore, the only way I see to display multilingual text that is encoded with entities is by:
<?php
   echo str_replace('&amp;', '&', htmlspecialchars($txt));
?>
The regex for numeric entities will skip the Latin-1 textual entities.


ryan

Actually, if you're using >= 4.0.5, this should theoretically be quicker (less overhead anyway):
$text = str_replace(array("&gt;", "&lt;", "&quot;", "&amp;"), array(">", "<", "\"", "&"), $text);


zolinak

A sample function, if anybody want to turn html entities (and special characters) back to simple. (eg: "&egrave;", "<" etc)
function html2specialchars($str){
$trans_table = array_flip(get_html_translation_table(HTML_ENTITIES));
return strtr($str, $trans_table);
}


drew

:// Escapes strings to be included in javascript
:function jsspecialchars($s) {
:    return preg_replace('/([^ :!#$%@()*+,-.\x30-\x5b\x5d-\x7e])/e',
:        "'\\x'.(ord('\\1')<16? '0': '').dechex(ord('\\1'))",$s);
:}
This function DOES NOT produce correct output in PHP5. Any strings containing a ” will be improperly escaped to \x5c, when it should be \x22.
I am not very good with regular expressions, so this is my solution to the problem.
//this is a workaround for jsspecialchars!
function ord2($s) {
if (strlen($s) == 2) {
return ord(substr($s,1,1));
} else {
return ord($s);
}
}
function JS_SpecialChars($s) {
return preg_replace(’/([^ !#$%@()*+,.\x30\x5b\x5d-\x7e])/e’,
”’\\x’.(ord2(’\\1’)&lt;16? ‘0’: ’’).dechex(ord2(’\\1’))”,$s);
}
I am sure that there is a better solution, but I can’t figure one out. This approach will probably also fix any other characters that end up being improperly escaped.


macisaac

<?php
// Escapes strings to be included in javascript
function jsspecialchars($s) {
return preg_replace('/([^ !#$%@()*+,-.\x30-\x5b\x5d-\x7e])/e',
"'\\x'.(ord('\\1')<16? '0': '').dechex(ord('\\1'))",$s);
}
?>
<script>
var some_variable = '<?= jsspecialchars($_GET['some_variable']) ?>';
</script>


Change Language


Follow Navioo On Twitter
addcslashes
addslashes
bin2hex
chop
chr
chunk_split
convert_cyr_string
convert_uudecode
convert_uuencode
count_chars
crc32
crypt
echo
explode
fprintf
get_html_translation_table
hebrev
hebrevc
html_entity_decode
htmlentities
htmlspecialchars_decode
htmlspecialchars
implode
join
levenshtein
localeconv
ltrim
md5_file
md5
metaphone
money_format
nl_langinfo
nl2br
number_format
ord
parse_str
print
printf
quoted_printable_decode
quotemeta
rtrim
setlocale
sha1_file
sha1
similar_text
soundex
sprintf
sscanf
str_getcsv
str_ireplace
str_pad
str_repeat
str_replace
str_rot13
str_shuffle
str_split
str_word_count
strcasecmp
strchr
strcmp
strcoll
strcspn
strip_tags
stripcslashes
stripos
stripslashes
stristr
strlen
strnatcasecmp
strnatcmp
strncasecmp
strncmp
strpbrk
strpos
strrchr
strrev
strripos
strrpos
strspn
strstr
strtok
strtolower
strtoupper
strtr
substr_compare
substr_count
substr_replace
substr
trim
ucfirst
ucwords
vfprintf
vprintf
vsprintf
wordwrap
eXTReMe Tracker