Delicious Bookmark this on Delicious Share on Facebook SlashdotSlashdot It! Digg! Digg



PHP : Function Reference : Multibyte String Functions : mb_convert_encoding

mb_convert_encoding

Convert character encoding (PHP 4 >= 4.0.6, PHP 5)
string mb_convert_encoding ( string str, string to_encoding [, mixed from_encoding] )

Example 1392. mb_convert_encoding() example

<?php
/* Convert internal character encoding to SJIS */
$str = mb_convert_encoding($str, "SJIS");

/* Convert EUC-JP to UTF-7 */
$str = mb_convert_encoding($str, "UTF-7", "EUC-JP");

/* Auto detect encoding from JIS, eucjp-win, sjis-win, then convert str to UCS-2LE */
$str = mb_convert_encoding($str, "UCS-2LE", "JIS, eucjp-win, sjis-win");

/* "auto" is expanded to "ASCII,JIS,UTF-8,EUC-JP,SJIS" */
$str = mb_convert_encoding($str, "EUC-JP", "auto");
?>

Code Examples / Notes » mb_convert_encoding

tom class

Why did you use the php html encode functions? mbstring has it's own Encoding which is (as far as I tested it) much more usefull:
HTML-ENTITIES
Example:
$text = mb_convert_encoding($text, 'HTML-ENTITIES', "UTF-8");


aofg

When converting Japanese strings to ISO-2022-JP or JIS on PHP >= 5.2.1, you can use "ISO-2022-JP-MS" instead of them.
Kishu-Izon (platform dependent) characters are converted correctly with the encoding, as same as with eucJP-win or with SJIS-win.


stephan van der feest

To add to the Flash conversion comment below, here's how I convert back from what I've stored in a database after converting from Flash HTML text field output, in order to load it back into a Flash HTML text field:
function htmltoflash($htmlstr)
{
 return str_replace("&lt;br /&gt;","\n",
   str_replace("<","&lt;",
     str_replace(">","&gt;",
       mb_convert_encoding(html_entity_decode($htmlstr),
       "UTF-8","ISO-8859-1"))));
}


petruzanauticoyahoo?com!ar

May be I'm not getting something, but this code:
<?php
print mb_detect_encoding( "ñ" )
print "<br/>"
print mb_convert_encoding( "ñ", "UTF-8" );
?>
Will yield this output:
UTF-8
ñ
So, was the string encoded in UTF-8 or wasn't it?


phpdoc

I'd like to share some code to convert latin diacritics to their
traditional 7bit representation, like, for example,
- à,ç,é,î,... to a,c,e,i,...
- ß to ss
- ä,Ä,... to ae,Ae,...
- ë,... to e,...
(mb_convert "7bit" would simply delete any offending characters).
I might have missed on your country's typographic
conventions--correct me then.
<?php
/**
* @args string $text line of encoded text
*       string $from_enc (encoding type of $text, e.g. UTF-8, ISO-8859-1)
*
* @returns 7bit representation
*/
function to7bit($text,$from_enc) {
   $text = mb_convert_encoding($text,'HTML-ENTITIES',$from_enc);
   $text = preg_replace(
       array('/&szlig;/','/&(..)lig;/',
            '/&([aouAOU])uml;/','/&(.)[^;]*;/'),
       array('ss',"$1","$1".'e',"$1"),
       $text);
   return $text;
}  
?>
Enjoy :-)
Johannes


volker

Hey guys. For everybody who's looking for a function that is converting an iso-string to utf8 or an utf8-string to iso, here's your solution:
public function encodeToUtf8($string) {
    return mb_convert_encoding($string, "UTF-8", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
}
public function encodeToIso($string) {
    return mb_convert_encoding($string, "ISO-8859-1", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
}
For me these functions are working fine. Give it a try


stephan van der feest

Here's a tip for anyone using Flash and PHP for storing HTML output submitted from a Flash text field in a database or whatever.
Flash submits its HTML special characters in UTF-8, so you can use the following function to convert those into HTML entity characters:
function utf8html($utf8str)
{
 return htmlentities(mb_convert_encoding($utf8str,"ISO-8859-1","UTF-8"));
}


mac.com@nemo

For those wanting to convert from $set to MacRoman, use iconv():
<?php
$string = iconv('UTF-8', 'macintosh', $string);
?>
('macintosh' is the IANA name for the MacRoman character set.)


jamespilcher1 - hotmail

be careful when converting from iso-8859-1 to utf-8.
even if you explicitly specify the character encoding of a page as iso-8859-1(via headers and strict xml defs), windows 2000 will ignore that and interpret it as whatever character set it has natively installed.
for example, i wrote char #128 into a page, with char encoding iso-8859-1, and it displayed in internet explorer (& mozilla) as a euro symbol.
it should have displayed a box, denoting that char #128 is undefined in iso-8859-1. The problem was it was displaying in "Windows: western europe" (my native character set).
this led to confusion when i tried to convert this euro to UTF-8 via mb_convert_encoding()  
IE displays UTF-8 correctly- and because PHP correctly converted #128 into a box in UTF-8, IE would show a box.
so all i saw was mb_convert_encoding() converting a euro symbol into a box. It took me a long time to figure out what was going on.


david hull

As an alternative to Johannes's suggestion for converting strings from other character sets to a 7bit representation while not just deleting latin diacritics, you might try this:
<?php
$text = iconv($from_enc, 'US-ASCII//TRANSLIT', $text);
?>
The only disadvantage is that it does not convert "ä" to "ae", but it handles punctuation and other special characters better.
--
David


lanka

Another sample of recoding without MultiByte enabling.
(Russian koi->win, if input in win-encoding already, function recode() returns unchanged string)
<?php
 // 0 - win
 // 1 - koi
 function detect_encoding($str) {
   $win = 0;
   $koi = 0;
   for($i=0; $i<strlen($str); $i++) {
     if( ord($str[$i]) >224 && ord($str[$i]) < 255) $win++;
     if( ord($str[$i]) >192 && ord($str[$i]) < 223) $koi++;
   }
   if( $win < $koi ) {
     return 1;
   } else return 0;
 }
 // recodes koi to win
 function koi_to_win($string) {
   $kw = array(128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183,  184, 185, 186, 187, 188, 189, 190, 191, 254, 224, 225, 246, 228, 229, 244, 227, 245, 232, 233, 234, 235, 236, 237, 238, 239, 255, 240, 241, 242, 243, 230, 226, 252, 251, 231, 248, 253, 249, 247, 250, 222, 192, 193, 214, 196, 197, 212, 195, 213, 200, 201, 202, 203, 204, 205, 206, 207, 223, 208, 209, 210, 211, 198, 194, 220, 219, 199, 216, 221, 217, 215, 218);
   $wk = array(128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183,  184, 185, 186, 187, 188, 189, 190, 191, 225, 226, 247, 231, 228, 229, 246, 250, 233, 234, 235, 236, 237, 238, 239, 240, 242,  243, 244, 245, 230, 232, 227, 254, 251, 253, 255, 249, 248, 252, 224, 241, 193, 194, 215, 199, 196, 197, 214, 218, 201, 202, 203, 204, 205, 206, 207, 208, 210, 211, 212, 213, 198, 200, 195, 222, 219, 221, 223, 217, 216, 220, 192, 209);
   $end = strlen($string);
   $pos = 0;
   do {
     $c = ord($string[$pos]);
     if ($c>128) {
       $string[$pos] = chr($kw[$c-128]);
     }
   } while (++$pos < $end);
   return $string;
 }
 function recode($str) {
   $enc = detect_encoding($str);
   if ($enc==1) {
     $str = koi_to_win($str);
   }
   return $str;
 }
?>


Change Language


Follow Navioo On Twitter
mb_check_encoding
mb_convert_case
mb_convert_encoding
mb_convert_kana
mb_convert_variables
mb_decode_mimeheader
mb_decode_numericentity
mb_detect_encoding
mb_detect_order
mb_encode_mimeheader
mb_encode_numericentity
mb_ereg_match
mb_ereg_replace
mb_ereg_search_getpos
mb_ereg_search_getregs
mb_ereg_search_init
mb_ereg_search_pos
mb_ereg_search_regs
mb_ereg_search_setpos
mb_ereg_search
mb_ereg
mb_eregi_replace
mb_eregi
mb_get_info
mb_http_input
mb_http_output
mb_internal_encoding
mb_language
mb_output_handler
mb_parse_str
mb_preferred_mime_name
mb_regex_encoding
mb_regex_set_options
mb_send_mail
mb_split
mb_strcut
mb_strimwidth
mb_stripos
mb_stristr
mb_strlen
mb_strpos
mb_strrchr
mb_strrichr
mb_strripos
mb_strrpos
mb_strstr
mb_strtolower
mb_strtoupper
mb_strwidth
mb_substitute_character
mb_substr_count
mb_substr
eXTReMe Tracker