Delicious Bookmark this on Delicious Share on Facebook SlashdotSlashdot It! Digg! Digg



PHP : Function Reference : Multibyte String Functions : mb_decode_numericentity

mb_decode_numericentity

Decode HTML numeric string reference to character (PHP 4 >= 4.0.6, PHP 5)
string mb_decode_numericentity ( string str, array convmap [, string encoding] )

Example 1395. convmap example

$convmap = array (
 
int start_code1, int end_code1, int offset1, int mask1,
 
int start_code2, int end_code2, int offset2, int mask2,
  ........
 
int start_codeN, int end_codeN, int offsetN, int maskN );
// Specify Unicode value for start_codeN and end_codeN
// Add offsetN to value and take bit-wise 'AND' with maskN,
// then convert value to numeric string reference.
?>

Code Examples / Notes » mb_decode_numericentity

donovan

note that at this time it seems that mb_decode_numericentity() only works with decimal entities and not hexadecimal entities.  This fact would have saved me a good hour of time in debugging.
For those who need to convert hex entities try first converting them all to decimal entities with a combination of the preg_replace() and hexdec() functions.


andrew simpson

Many web browsers will tend upload high order characters as UTF-8 encoded entities.
Here is some simple code to convert UTF-8 HTML entities within a block of text into proper characters:
<?php
  //decode decimal HTML entities added by web browser
 $body = preg_replace('/&#\d{2,5};/ue', "utf8_entity_decode('$0')", $body );
 //decode hex HTML entities added by web browser
 $body = preg_replace('/&#x([a-fA-F0-7]{2,8});/ue', "utf8_entity_decode('&#'.hexdec('$1').';')", $body );
//callback function for the regex
function utf8_entity_decode($entity){
$convmap = array(0x0, 0x10000, 0, 0xfffff);
return mb_decode_numericentity($entity, $convmap, 'UTF-8');
}
?>


dev

Just two great functions for daily use:
/* Converts any HTML-entities into characters */
function my_numeric2character($t)
{
$convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
return mb_decode_numericentity($t, $convmap, 'UTF-8');
}
/* Converts any characters into HTML-entities */
function my_character2numeric($t)
{
$convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
return mb_encode_numericentity($t, $convmap, 'UTF-8');
}
print my_numeric2character('&#8217; &#7936; &#226;');
print my_character2numeric(' ’ â ');


php

Here are functions to convert hankaku to zenkaku characters (and vice-versa) in Japanese text.
<?php
// Supported characters:
//    (space)
//     !#$%&()*+,./0123456789:;<=>?@
//    ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
//    abcdefghijklmnopqrstuvwxyz{|}
// (Katakana isn't supported.)
function f_han2zen ($string,$encoding = null) {
 if (is_null($encoding)) $encoding = mb_internal_encoding();
 $convmap = array(
    0x20,0x20,0x3000-0x20,0xffff,   // Space
    0x21,0x7e,0xff01-0x21,0xffff);
 $temp = mb_encode_numericentity($string,$convmap,$encoding);
 $convmap = array(0,0xffff,0,0xffff);
 return mb_decode_numericentity($temp,$convmap,$encoding);
}
function f_zen2han ($string,$encoding = null) {
 if (is_null($encoding)) $encoding = mb_internal_encoding();
 $convmap = array(
    0x3000,0x3000,-(0x3000-0x20),0xffff,   // Space
    0xff01,0xff5e,-(0xff01-0x21),0xffff);
 $temp = mb_encode_numericentity($string,$convmap,$encoding);
 $convmap = array(0,0xffff,0,0xffff);
 return mb_decode_numericentity($temp,$convmap,$encoding);
}
// Sample usage:
f_han2zen("test","shift_jis");
f_han2zen("test","utf-8");
?>


dirk

By use of function utf8_decode you'll get a problem with all extended chars above ISO-8859-1 charset. You can solve this problem by using the
function mb_encode_numericentity before:
 // convert $text from UTF-8 to ISO-8859-1
 $convmap = array(0xFF, 0x2FFFF, 0, 0xFFFF);
 $text = mb_encode_numericentity($text, $convmap, "UTF-8");
 $text = utf8_decode($text);
The second line encodes all extended chars below 0xFF, the third line converts the rest: 0x80 - 0xFF


Change Language


Follow Navioo On Twitter
mb_check_encoding
mb_convert_case
mb_convert_encoding
mb_convert_kana
mb_convert_variables
mb_decode_mimeheader
mb_decode_numericentity
mb_detect_encoding
mb_detect_order
mb_encode_mimeheader
mb_encode_numericentity
mb_ereg_match
mb_ereg_replace
mb_ereg_search_getpos
mb_ereg_search_getregs
mb_ereg_search_init
mb_ereg_search_pos
mb_ereg_search_regs
mb_ereg_search_setpos
mb_ereg_search
mb_ereg
mb_eregi_replace
mb_eregi
mb_get_info
mb_http_input
mb_http_output
mb_internal_encoding
mb_language
mb_output_handler
mb_parse_str
mb_preferred_mime_name
mb_regex_encoding
mb_regex_set_options
mb_send_mail
mb_split
mb_strcut
mb_strimwidth
mb_stripos
mb_stristr
mb_strlen
mb_strpos
mb_strrchr
mb_strrichr
mb_strripos
mb_strrpos
mb_strstr
mb_strtolower
mb_strtoupper
mb_strwidth
mb_substitute_character
mb_substr_count
mb_substr
eXTReMe Tracker