Delicious Bookmark this on Delicious Share on Facebook SlashdotSlashdot It! Digg! Digg



PHP : Function Reference : String Functions : get_html_translation_table

get_html_translation_table

Returns the translation table used by htmlspecialchars and htmlentities (PHP 4, PHP 5)
array get_html_translation_table ( [int table [, int quote_style]] )

get_html_translation_table() will return the translation table that is used internally for htmlspecialchars() and htmlentities().

Note:

Special characters can be encoded in several ways. E.g. " can be encoded as ", " or &#x22. get_html_translation_table() returns only the most common form for them.

Parameters

table

There are two new constants (HTML_ENTITIES, HTML_SPECIALCHARS) that allow you to specify the table you want. Default value for table is HTML_SPECIALCHARS.

quote_style

And as in the htmlspecialchars() and htmlentities() functions you can optionally specify the quote_style you are working with. The default is ENT_COMPAT mode. See the description of these modes in htmlspecialchars().

Return Values

Returns the translation table as an array.

Examples

Example2411.Translation Table Example

<?php
$trans
= get_html_translation_table(HTML_ENTITIES);
$str = "Hallo & <Frau> & Krmer";
$encoded = strtr($str, $trans);
?>


The $encoded variable will now contain: "Hallo &amp; &lt;Frau&gt; &amp; Kr&auml;mer".

Code Examples / Notes » get_html_translation_table

trukin

There have been issues when hispanic websites or other websites dont use the corrent collision in mysql.
Some problems result that the accents (éä ... ) result in weird characters when a backup is done and restored later on. Or when database is changed to another one.
To fix this try something like this
function accents($text){
foreach(get_html_translation_table(HTML_ENTITIES) as $a=>$b){
$text = str_replace($a,$b,$text);
}
return $text;
}
and use as accents("Hello ....... WITH ACCENTS") and it will return the escaped string.


yes

Searching for a fast replacement of the MS WORD special characters which are not covered by get_html_translation_table() , I think the following function might help someone
<?php
function clean_up($str){
$str = stripslashes($str);
$str = strtr($str, get_html_translation_table(HTML_ENTITIES));
$str = str_replace( array("\x82", "\x84", "\x85", "\x91", "\x92", "\x93", "\x94", "\x95", "\x96",  "\x97"), array("&#8218;", "&#8222;", "&#8230;", "&#8216;", "&#8217;", "&#8220;", "&#8221;", "&#8226;", "&#8211;", "&#8212;"),$str);
return $str;
}
?>
It replaces all types of quotes (single and double), horizontal ellipsis (...), bullet, en dash and em dash.


edwardzyang

Quite disappointingly, get_html_translation_table() only gives the characters for ISO-8859-1, making it quite useless for UTF-8 or anything else like that (as a previous commenter noticed).

patrick nospam

Not sure what's going on here but I've run into a problem that others might face as well...
<?php
$translations = array_flip(get_html_translation_table(HTML_ENTITIES,ENT_QUOTES));
?>
returns the single quote ' as being equal to &#39; while
<?php
$translatedString = htmlentities($string,ENT_QUOTES);
?>
returns it as being equal to &#039;
I've had to do a specific string replacement for the time being... Not sure if it's an issue with the function or the array manipulation.
-Pat


ryan

In XML, you can't assume that the doctype will include the same character entity definitions as HTML. XML authors may require character references instead. The following two functions use get_html_translation_table() to encode data in numeric references. The second, optional argument can be used to substitute a different translation table.
function xmlcharacters($string, $trans='') {
$trans=(is_array($trans))? $trans:get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
foreach ($trans as $k=>$v)
$trans[$k]= "&#".ord($k).";";
return strtr($string, $trans);
}
function xml_character_decode($string, $trans='') {
$trans=(is_array($trans))? $trans:get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
foreach ($trans as $k=>$v)
$trans[$k]= "&#".ord($k).";";
$trans=array_flip($trans);
return strtr($string, $trans);
}


alex minkoff

If you want to display special HTML entities in a web browser, you can use the following code:
<?
$entities = get_html_translation_table(HTML_ENTITIES);
foreach ($entities as $entity) {
$new_entities[$entity] = htmlspecialchars($entity);
}
echo "<pre>";
print_r($new_entities);
echo "</pre>";
?>
If you don't, the key name of each element will appear to be the same as the element content itself, making it look mighty stupid. ;)


alan

If you want to decode all those &#123; symbols as well....
function unhtmlentities ($string)  {
   $trans_tbl = get_html_translation_table (HTML_ENTITIES);
   $trans_tbl = array_flip ($trans_tbl);
   $ret = strtr ($string, $trans_tbl);
   return  preg_replace('/\&\#([0-9]+)\;/me',
       "chr('\\1')",$ret);
}


maurizio siliani

If you have troubles (like me) getting data from ISO-8859-1 encoded forms where user copy and paste from word, this routine could be useful.
It adds to the standard get_html_translation_table the codes of the characters usually M$ Word replacs into typed text.
Otherwise those characters would never be displayed correctly in html output.
function get_html_translation_table_CP1252() {
$trans = get_html_translation_table(HTML_ENTITIES);
$trans[chr(130)] = '&sbquo;'; // Single Low-9 Quotation Mark
$trans[chr(131)] = '&fnof;'; // Latin Small Letter F With Hook
$trans[chr(132)] = '&bdquo;'; // Double Low-9 Quotation Mark
$trans[chr(133)] = '&hellip;'; // Horizontal Ellipsis
$trans[chr(134)] = '&dagger;'; // Dagger
$trans[chr(135)] = '&Dagger;'; // Double Dagger
$trans[chr(136)] = '&circ;'; // Modifier Letter Circumflex Accent
$trans[chr(137)] = '&permil;'; // Per Mille Sign
$trans[chr(138)] = '&Scaron;'; // Latin Capital Letter S With Caron
$trans[chr(139)] = '&lsaquo;'; // Single Left-Pointing Angle Quotation Mark
$trans[chr(140)] = '&OElig; '; // Latin Capital Ligature OE
$trans[chr(145)] = '&lsquo;'; // Left Single Quotation Mark
$trans[chr(146)] = '&rsquo;'; // Right Single Quotation Mark
$trans[chr(147)] = '&ldquo;'; // Left Double Quotation Mark
$trans[chr(148)] = '&rdquo;'; // Right Double Quotation Mark
$trans[chr(149)] = '&bull;'; // Bullet
$trans[chr(150)] = '&ndash;'; // En Dash
$trans[chr(151)] = '&mdash;'; // Em Dash
$trans[chr(152)] = '&tilde;'; // Small Tilde
$trans[chr(153)] = '&trade;'; // Trade Mark Sign
$trans[chr(154)] = '&scaron;'; // Latin Small Letter S With Caron
$trans[chr(155)] = '&rsaquo;'; // Single Right-Pointing Angle Quotation Mark
$trans[chr(156)] = '&oelig;'; // Latin Small Ligature OE
$trans[chr(159)] = '&Yuml;'; // Latin Capital Letter Y With Diaeresis
ksort($trans);
return $trans;
}


iain duh workingsoftware.com.au

I wrote a quick little function for converting something like '&middot;' into '&#183;':
$to_convert = '&middot;';
$table = get_html_translation_table(HTML_ENTITIES);
$equiv = '&#'.ord(array_search($to_convert,$table)).';';


jérôme jaglale

htmlentities includes htmlspecialchars, so here's how to convert an UTF-8 string :
htmlentities($string, ENT_QUOTES, 'UTF-8');


dirk

get_html_translation_table
It works only with the first 256 Codepositions.
For Higher Positions, for Example &#1092;
(a kyrillic Letter) it shows the same.


zohar

Another way of converting HTML entities into numeric entities to please XML parsers is using two arrays as conversion tables in a preg_replace function. The conversion table mechanism is based on Ryan's examples above.
<?php
function xmlEntities($s){
//build first an assoc. array with the entities we want to match
$table1 = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
//now build another assoc. array with the entities we want to replace (numeric entities)
foreach ($table1 as $k=>$v){
 $table1[$k] = "/$v/";
 $c = htmlentities($k,ENT_QUOTES,"UTF-8");
 $table2[$c] = "&#".ord($k).";";
}
//now perform a replacement using preg_replace
//each matched value in array 1 will be replaced with the corresponding value in array 2
$s = preg_replace($table1,$table2,$s);
return $s;
}
?>


chris

and a few more :
'&image;'=>'&#8465;', '&weierp;'=>'&#8472;', '&real;'=>'&#8476;', '&trade;'=>'&#8482;', '&alefsym;'=>'&#8501;', '&larr;'=>'&#8592;', '&uarr;'=>'&#8593;', '&rarr;'=>'&#8594;', '&darr;'=>'&#8595;', '&harr;'=>'&#8596;', '&crarr;'=>'&#8629;', '&lArr;'=>'&#8656;', '&uArr;'=>'&#8657;', '&rArr;'=>'&#8658;', '&dArr;'=>'&#8659;', '&hArr;'=>'&#8660;', '&forall;'=>'&#8704;', '&part;'=>'&#8706;', '&exist;'=>'&#8707;', '&empty;'=>'&#8709;', '&nabla;'=>'&#8711;', '&isin;'=>'&#8712;', '&notin;'=>'&#8713;', '&ni;'=>'&#8715;', '&prod;'=>'&#8719;', '&sum;'=>'&#8721;', '&minus;'=>'&#8722;', '&lowast;'=>'&#8727;', '&radic;'=>'&#8730;', '&prop;'=>'&#8733;', '&infin;'=>'&#8734;', '&ang;'=>'&#8736;', '&and;'=>'&#8743;', '&or;'=>'&#8744;', '&cap;'=>'&#8745;', '&cup;'=>'&#8746;', '&int;'=>'&#8747;', '&there4;'=>'&#8756;', '&sim;'=>'&#8764;', '&cong;'=>'&#8773;', '&asymp;'=>'&#8776;', '&ne;'=>'&#8800;', '&equiv;'=>'&#8801;', '&le;'=>'&#8804;', '&ge;'=>'&#8805;', '&sub;'=>'&#8834;', '&sup;'=>'&#8835;', '&nsub;'=>'&#8836;', '&sube;'=>'&#8838;', '&supe;'=>'&#8839;', '&oplus;'=>'&#8853;', '&otimes;'=>'&#8855;', '&perp;'=>'&#8869;', '&sdot;'=>'&#8901;', '&lceil;'=>'&#8968;', '&rceil;'=>'&#8969;', '&lfloor;'=>'&#8970;', '&rfloor;'=>'&#8971;', '&lang;'=>'&#9001;', '&rang;'=>'&#9002;', '&loz;'=>'&#9674;', '&spades;'=>'&#9824;', '&clubs;'=>'&#9827;', '&hearts;'=>'&#9829;', '&diams;'=>'&#9830;'


kevin_bro

Alans version didn't seem to work right. If you're having the same problem consider using this slightly modified version instead:
function unhtmlentities ($string)  {
  $trans_tbl = get_html_translation_table (HTML_ENTITIES);
  $trans_tbl = array_flip ($trans_tbl);
  $ret = strtr ($string, $trans_tbl);
  return preg_replace('/&#(\d+);/me',
     "chr('\\1')",$ret);
}


chris

A lot of quite common characters (or at least not rare, like oelig, euro or minus) are missing from the table unfortunately.
Here are some, if you want to make your translation table more complete and your xml data less error-prone. Not sure why some characters have 2 codes, just use one. Here goes: '&apos;'=>'&#39;', '&minus;'=>'&#45;', '&circ;'=>'&#94;', '&tilde;'=>'&#126;', '&Scaron;'=>'&#138;', '&lsaquo;'=>'&#139;', '&OElig;'=>'&#140;', '&lsquo;'=>'&#145;', '&rsquo;'=>'&#146;', '&ldquo;'=>'&#147;', '&rdquo;'=>'&#148;', '&bull;'=>'&#149;', '&ndash;'=>'&#150;', '&mdash;'=>'&#151;', '&tilde;'=>'&#152;', '&trade;'=>'&#153;', '&scaron;'=>'&#154;', '&rsaquo;'=>'&#155;', '&oelig;'=>'&#156;', '&Yuml;'=>'&#159;', '&yuml;'=>'&#255;', '&OElig;'=>'&#338;', '&oelig;'=>'&#339;', '&Scaron;'=>'&#352;', '&scaron;'=>'&#353;', '&Yuml;'=>'&#376;', '&fnof;'=>'&#402;', '&circ;'=>'&#710;', '&tilde;'=>'&#732;', '&Alpha;'=>'&#913;', '&Beta;'=>'&#914;', '&Gamma;'=>'&#915;', '&Delta;'=>'&#916;', '&Epsilon;'=>'&#917;', '&Zeta;'=>'&#918;', '&Eta;'=>'&#919;', '&Theta;'=>'&#920;', '&Iota;'=>'&#921;', '&Kappa;'=>'&#922;', '&Lambda;'=>'&#923;', '&Mu;'=>'&#924;', '&Nu;'=>'&#925;', '&Xi;'=>'&#926;', '&Omicron;'=>'&#927;', '&Pi;'=>'&#928;', '&Rho;'=>'&#929;', '&Sigma;'=>'&#931;', '&Tau;'=>'&#932;', '&Upsilon;'=>'&#933;', '&Phi;'=>'&#934;', '&Chi;'=>'&#935;', '&Psi;'=>'&#936;', '&Omega;'=>'&#937;', '&alpha;'=>'&#945;', '&beta;'=>'&#946;', '&gamma;'=>'&#947;', '&delta;'=>'&#948;', '&epsilon;'=>'&#949;', '&zeta;'=>'&#950;', '&eta;'=>'&#951;', '&theta;'=>'&#952;', '&iota;'=>'&#953;', '&kappa;'=>'&#954;', '&lambda;'=>'&#955;', '&mu;'=>'&#956;', '&nu;'=>'&#957;', '&xi;'=>'&#958;', '&omicron;'=>'&#959;', '&pi;'=>'&#960;', '&rho;'=>'&#961;', '&sigmaf;'=>'&#962;', '&sigma;'=>'&#963;', '&tau;'=>'&#964;', '&upsilon;'=>'&#965;', '&phi;'=>'&#966;', '&chi;'=>'&#967;', '&psi;'=>'&#968;', '&omega;'=>'&#969;', '&thetasym;'=>'&#977;', '&upsih;'=>'&#978;', '&piv;'=>'&#982;', '&ensp;'=>'&#8194;', '&emsp;'=>'&#8195;', '&thinsp;'=>'&#8201;', '&zwnj;'=>'&#8204;', '&zwj;'=>'&#8205;', '&lrm;'=>'&#8206;', '&rlm;'=>'&#8207;', '&ndash;'=>'&#8211;', '&mdash;'=>'&#8212;', '&lsquo;'=>'&#8216;', '&rsquo;'=>'&#8217;', '&sbquo;'=>'&#8218;', '&ldquo;'=>'&#8220;', '&rdquo;'=>'&#8221;', '&bdquo;'=>'&#8222;', '&dagger;'=>'&#8224;', '&Dagger;'=>'&#8225;', '&bull;'=>'&#8226;', '&hellip;'=>'&#8230;', '&permil;'=>'&#8240;', '&prime;'=>'&#8242;', '&Prime;'=>'&#8243;', '&lsaquo;'=>'&#8249;', '&rsaquo;'=>'&#8250;', '&oline;'=>'&#8254;', '&frasl;'=>'&#8260;', '&euro;'=>'&#8364;'


Change Language


Follow Navioo On Twitter
addcslashes
addslashes
bin2hex
chop
chr
chunk_split
convert_cyr_string
convert_uudecode
convert_uuencode
count_chars
crc32
crypt
echo
explode
fprintf
get_html_translation_table
hebrev
hebrevc
html_entity_decode
htmlentities
htmlspecialchars_decode
htmlspecialchars
implode
join
levenshtein
localeconv
ltrim
md5_file
md5
metaphone
money_format
nl_langinfo
nl2br
number_format
ord
parse_str
print
printf
quoted_printable_decode
quotemeta
rtrim
setlocale
sha1_file
sha1
similar_text
soundex
sprintf
sscanf
str_getcsv
str_ireplace
str_pad
str_repeat
str_replace
str_rot13
str_shuffle
str_split
str_word_count
strcasecmp
strchr
strcmp
strcoll
strcspn
strip_tags
stripcslashes
stripos
stripslashes
stristr
strlen
strnatcasecmp
strnatcmp
strncasecmp
strncmp
strpbrk
strpos
strrchr
strrev
strripos
strrpos
strspn
strstr
strtok
strtolower
strtoupper
strtr
substr_compare
substr_count
substr_replace
substr
trim
ucfirst
ucwords
vfprintf
vprintf
vsprintf
wordwrap
eXTReMe Tracker