Delicious Bookmark this on Delicious Share on Facebook SlashdotSlashdot It! Digg! Digg



PHP : Function Reference : Multibyte String Functions : mb_decode_numericentity

mb_decode_numericentity

Decode HTML numeric string reference to character (PHP 4 >= 4.0.6, PHP 5)
string mb_decode_numericentity ( string str, array convmap [, string encoding] )

Convert numeric string reference of string str in specified block to character. It returns converted string.

convmap is array to specifies code area to convert.

encoding is character encoding. If it is omitted, internal character encoding is used.

Example 1395. convmap example

$convmap = array (
 
int start_code1, int end_code1, int offset1, int mask1,
 
int start_code2, int end_code2, int offset2, int mask2,
  ........
 
int start_codeN, int end_codeN, int offsetN, int maskN );
// Specify Unicode value for start_codeN and end_codeN
// Add offsetN to value and take bit-wise 'AND' with maskN,
// then convert value to numeric string reference.
?>


See also mb_encode_numericentity().

Code Examples / Notes » mb_decode_numericentity

donovan

note that at this time it seems that mb_decode_numericentity() only works with decimal entities and not hexadecimal entities.  This fact would have saved me a good hour of time in debugging.
For those who need to convert hex entities try first converting them all to decimal entities with a combination of the preg_replace() and hexdec() functions.


andrew simpson

Many web browsers will tend upload high order characters as UTF-8 encoded entities.
Here is some simple code to convert UTF-8 HTML entities within a block of text into proper characters:
<?php
  //decode decimal HTML entities added by web browser
 $body = preg_replace('/&#\d{2,5};/ue', "utf8_entity_decode('$0')", $body );
 //decode hex HTML entities added by web browser
 $body = preg_replace('/&#x([a-fA-F0-7]{2,8});/ue', "utf8_entity_decode('&#'.hexdec('$1').';')", $body );
//callback function for the regex
function utf8_entity_decode($entity){
$convmap = array(0x0, 0x10000, 0, 0xfffff);
return mb_decode_numericentity($entity, $convmap, 'UTF-8');
}
?>


dev

Just two great functions for daily use:
/* Converts any HTML-entities into characters */
function my_numeric2character($t)
{
$convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
return mb_decode_numericentity($t, $convmap, 'UTF-8');
}
/* Converts any characters into HTML-entities */
function my_character2numeric($t)
{
$convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
return mb_encode_numericentity($t, $convmap, 'UTF-8');
}
print my_numeric2character('&#8217; &#7936; &#226;');
print my_character2numeric(' ’ â ');


php

Here are functions to convert hankaku to zenkaku characters (and vice-versa) in Japanese text.
<?php
// Supported characters:
//    (space)
//     !#$%&()*+,./0123456789:;<=>?@
//    ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
//    abcdefghijklmnopqrstuvwxyz{|}
// (Katakana isn't supported.)
function f_han2zen ($string,$encoding = null) {
 if (is_null($encoding)) $encoding = mb_internal_encoding();
 $convmap = array(
    0x20,0x20,0x3000-0x20,0xffff,   // Space
    0x21,0x7e,0xff01-0x21,0xffff);
 $temp = mb_encode_numericentity($string,$convmap,$encoding);
 $convmap = array(0,0xffff,0,0xffff);
 return mb_decode_numericentity($temp,$convmap,$encoding);
}
function f_zen2han ($string,$encoding = null) {
 if (is_null($encoding)) $encoding = mb_internal_encoding();
 $convmap = array(
    0x3000,0x3000,-(0x3000-0x20),0xffff,   // Space
    0xff01,0xff5e,-(0xff01-0x21),0xffff);
 $temp = mb_encode_numericentity($string,$convmap,$encoding);
 $convmap = array(0,0xffff,0,0xffff);
 return mb_decode_numericentity($temp,$convmap,$encoding);
}
// Sample usage:
f_han2zen("test","shift_jis");
f_han2zen("test","utf-8");
?>


dirk

By use of function utf8_decode you'll get a problem with all extended chars above ISO-8859-1 charset. You can solve this problem by using the
function mb_encode_numericentity before:
 // convert $text from UTF-8 to ISO-8859-1
 $convmap = array(0xFF, 0x2FFFF, 0, 0xFFFF);
 $text = mb_encode_numericentity($text, $convmap, "UTF-8");
 $text = utf8_decode($text);
The second line encodes all extended chars below 0xFF, the third line converts the rest: 0x80 - 0xFF


Change Language


Follow Navioo On Twitter
mb_check_encoding
mb_convert_case
mb_convert_encoding
mb_convert_kana
mb_convert_variables
mb_decode_mimeheader
mb_decode_numericentity
mb_detect_encoding
mb_detect_order
mb_encode_mimeheader
mb_encode_numericentity
mb_ereg_match
mb_ereg_replace
mb_ereg_search_getpos
mb_ereg_search_getregs
mb_ereg_search_init
mb_ereg_search_pos
mb_ereg_search_regs
mb_ereg_search_setpos
mb_ereg_search
mb_ereg
mb_eregi_replace
mb_eregi
mb_get_info
mb_http_input
mb_http_output
mb_internal_encoding
mb_language
mb_output_handler
mb_parse_str
mb_preferred_mime_name
mb_regex_encoding
mb_regex_set_options
mb_send_mail
mb_split
mb_strcut
mb_strimwidth
mb_stripos
mb_stristr
mb_strlen
mb_strpos
mb_strrchr
mb_strrichr
mb_strripos
mb_strrpos
mb_strstr
mb_strtolower
mb_strtoupper
mb_strwidth
mb_substitute_character
mb_substr_count
mb_substr
eXTReMe Tracker