|
mb_decode_numericentity
Decode HTML numeric string reference to character
(PHP 4 >= 4.0.6, PHP 5)
Convert numeric string reference of string str in specified block to character. It returns converted string. convmap is array to specifies code area to convert. encoding is character encoding. If it is omitted, internal character encoding is used. Example 1395. convmap example$convmap = array ( See also mb_encode_numericentity(). Code Examples / Notes » mb_decode_numericentitydonovan
note that at this time it seems that mb_decode_numericentity() only works with decimal entities and not hexadecimal entities. This fact would have saved me a good hour of time in debugging. For those who need to convert hex entities try first converting them all to decimal entities with a combination of the preg_replace() and hexdec() functions. andrew simpson
Many web browsers will tend upload high order characters as UTF-8 encoded entities. Here is some simple code to convert UTF-8 HTML entities within a block of text into proper characters: <?php //decode decimal HTML entities added by web browser $body = preg_replace('/&#\d{2,5};/ue', "utf8_entity_decode('$0')", $body ); //decode hex HTML entities added by web browser $body = preg_replace('/&#x([a-fA-F0-7]{2,8});/ue', "utf8_entity_decode('&#'.hexdec('$1').';')", $body ); //callback function for the regex function utf8_entity_decode($entity){ $convmap = array(0x0, 0x10000, 0, 0xfffff); return mb_decode_numericentity($entity, $convmap, 'UTF-8'); } ?> dev
Just two great functions for daily use: /* Converts any HTML-entities into characters */ function my_numeric2character($t) { $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF); return mb_decode_numericentity($t, $convmap, 'UTF-8'); } /* Converts any characters into HTML-entities */ function my_character2numeric($t) { $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF); return mb_encode_numericentity($t, $convmap, 'UTF-8'); } print my_numeric2character('’ ἀ â'); print my_character2numeric(' â '); php
Here are functions to convert hankaku to zenkaku characters (and vice-versa) in Japanese text. <?php // Supported characters: // (space) // !#$%&()*+,./0123456789:;<=>?@ // ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_` // abcdefghijklmnopqrstuvwxyz{|} // (Katakana isn't supported.) function f_han2zen ($string,$encoding = null) { if (is_null($encoding)) $encoding = mb_internal_encoding(); $convmap = array( 0x20,0x20,0x3000-0x20,0xffff, // Space 0x21,0x7e,0xff01-0x21,0xffff); $temp = mb_encode_numericentity($string,$convmap,$encoding); $convmap = array(0,0xffff,0,0xffff); return mb_decode_numericentity($temp,$convmap,$encoding); } function f_zen2han ($string,$encoding = null) { if (is_null($encoding)) $encoding = mb_internal_encoding(); $convmap = array( 0x3000,0x3000,-(0x3000-0x20),0xffff, // Space 0xff01,0xff5e,-(0xff01-0x21),0xffff); $temp = mb_encode_numericentity($string,$convmap,$encoding); $convmap = array(0,0xffff,0,0xffff); return mb_decode_numericentity($temp,$convmap,$encoding); } // Sample usage: f_han2zen("test","shift_jis"); f_han2zen("test","utf-8"); ?> dirk
By use of function utf8_decode you'll get a problem with all extended chars above ISO-8859-1 charset. You can solve this problem by using the function mb_encode_numericentity before: // convert $text from UTF-8 to ISO-8859-1 $convmap = array(0xFF, 0x2FFFF, 0, 0xFFFF); $text = mb_encode_numericentity($text, $convmap, "UTF-8"); $text = utf8_decode($text); The second line encodes all extended chars below 0xFF, the third line converts the rest: 0x80 - 0xFF |
Change Languagemb_check_encoding mb_convert_case mb_convert_encoding mb_convert_kana mb_convert_variables mb_decode_mimeheader mb_decode_numericentity mb_detect_encoding mb_detect_order mb_encode_mimeheader mb_encode_numericentity mb_ereg_match mb_ereg_replace mb_ereg_search_getpos mb_ereg_search_getregs mb_ereg_search_init mb_ereg_search_pos mb_ereg_search_regs mb_ereg_search_setpos mb_ereg_search mb_ereg mb_eregi_replace mb_eregi mb_get_info mb_http_input mb_http_output mb_internal_encoding mb_language mb_output_handler mb_parse_str mb_preferred_mime_name mb_regex_encoding mb_regex_set_options mb_send_mail mb_split mb_strcut mb_strimwidth mb_stripos mb_stristr mb_strlen mb_strpos mb_strrchr mb_strrichr mb_strripos mb_strrpos mb_strstr mb_strtolower mb_strtoupper mb_strwidth mb_substitute_character mb_substr_count mb_substr |