Uni.t
uni.t is set of 3 routines:
unichr.t
uniord.t
mb_str_split.t
Up to year 2021, the most of software confuse some unicode characters. For the recognition and the testing, all the 3 routines above are necessary. In order to simplify the handling, the routines are combined in the code below.
Code
<?php
function unichr($dec) {
if ($dec < 128) {
$utf = chr($dec);
} else if ($dec < 2048) {
$utf = chr(192 + (($dec - ($dec % 64)) / 64));
$utf .= chr(128 + ($dec % 64));
} else {
$utf = chr(224 + (($dec - ($dec % 4096)) / 4096));
$utf .= chr(128 + ((($dec % 4096) - ($dec % 64)) / 64));
$utf .= chr(128 + ($dec % 64));
}
return $utf;
}
function mb_str_split($str) {
// split multibyte string in characters
// at all positions except the start: ^
// and the end: $
$pattern = '/(?<!^)(?!$)/u';
return preg_split($pattern,$str);
}
function uniord($a)
{
$M=strlen($a);
$p=ord($a[0]); if($M==1) return $p;
$p-=194; $p*=64; $p+=ord($a[1]); if($M==2) return $p;
$p-=2050; $p*=64; $p+=ord($a[2]); return $p;
}
?>
References