Mb str split.t
mb_str_split.t is the PHP function that splits the input string to the Utf8 characters. The output is array of strings, each of them counts one, two of three bytes.
Code
<?php function mb_str_split($str) { // split multibyte string in characters // Split at all positions, not after the start: ^ // and not before the end: $ $pattern = '/(?<!^)(?!$)/u'; return preg_split($pattern,$str); }
Example of calling
<?php include "mb_str_split.t"; $a="私の名前はマー\n Tori です。"; $N=strlen($a); echo $N, "\n"; for($n=0;$n<$N;$n++) { printf("%02x ",ord($a[$n]) ); } $b = mb_str_split($a); var_dump($b); $M=count($b); for($m=0;$m<$M;$m++) { $c=$b[$m]; $d=strlen($c); echo "$c $d "; for($n=0;$n<$d;$n++) printf("%2x ",ord($c[$n])); echo "\n"; } ?>
Output
36 e7 a7 81 e3 81 ae e5 90 8d e5 89 8d e3 81 af e3 83 9e e3 83 bc 0a 54 6f 72 69 20 e3 81 a7 e3 81 99 e3 80 82 array(16) { [0]=> string(3) "私" [1]=> string(3) "の" [2]=> string(3) "名" [3]=> string(3) "前" [4]=> string(3) "は" [5]=> string(3) "マ" [6]=> string(3) "ー" [7]=> string(1) " " [8]=> string(1) "T" [9]=> string(1) "o" [10]=> string(1) "r" [11]=> string(1) "i" [12]=> string(1) " " [13]=> string(3) "で" [14]=> string(3) "す" [15]=> string(3) "。" } 私 3 e7 a7 81 231 の 3 e3 81 ae 227 名 3 e5 90 8d 229 前 3 e5 89 8d 229 は 3 e3 81 af 227 マ 3 e3 83 9e 227 ー 3 e3 83 bc 227 1 a 10 T 1 54 84 o 1 6f 111 r 1 72 114 i 1 69 105 1 20 32 で 3 e3 81 a7 227 す 3 e3 81 99 227 。 3 e3 80 82 227
Analogies
The extended version of PHP includes function mb_str_split().
The appropriate setting of PHP requires certain skills.
Many lamers cannot reconfigurate the PHP software without to breakdown their servers.
For them, it is easier to write own version of the function,
than to find the setting file that should be modified and edit it in some appropriate way.
So, several self-made versions of mb_str_split appear in the internet.
Only one, very short, is presented above.
The required function preg_split seems to be supported in the default PHP setting.
References
https://en.wikipedia.org/wiki/List_of_jōyō_kanji
https://en.wikipedia.org/wiki/List_of_j%C5%8Dy%C5%8D_kanji
https://www.php.net/manual/de/function.mb-str-split.php User Contributed Notes Polyfill PHP < 7.4 based on package "symfony/polyfill-mbstring": (2020). Much longer (and perhaps more universal) code is suggested that seems to do the same.
Keywords
Japanese, Kanji, PHP, SomeH, SomeUtf8, Utf8, Utf8table, UtfH