Uni.t

From TORI
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

uni.t is set of 3 routines:
unichr.t
uniord.t
mb_str_split.t

Up to year 2021, the most of software confuse some unicode characters. For the recognition and the testing, all the 3 routines above are necessary. In order to simplify the handling, the routines are combined in the code below.

Code

<?php
function unichr($dec) {
  if ($dec < 128) {
    $utf = chr($dec);
  } else if ($dec < 2048) {
    $utf = chr(192 + (($dec - ($dec % 64)) / 64));
    $utf .= chr(128 + ($dec % 64));
  } else {
    $utf = chr(224 + (($dec - ($dec % 4096)) / 4096));
    $utf .= chr(128 + ((($dec % 4096) - ($dec % 64)) / 64));
    $utf .= chr(128 + ($dec % 64));
  }
  return $utf;
}

function mb_str_split($str) {
   // split multibyte string in characters
   // at all positions except the start: ^
   // and the end: $
   $pattern = '/(?<!^)(?!$)/u';
   return preg_split($pattern,$str);
}

function uniord($a)
{
  $M=strlen($a);
  $p=ord($a[0]); if($M==1) return $p;
  $p-=194; $p*=64; $p+=ord($a[1]); if($M==2) return $p;
  $p-=2050; $p*=64; $p+=ord($a[2]); return $p;
}
?>

References


Keywords

PHP, du.t, mb_str_split.t, unichr.t, uniord.t

Japanese, Kanji, KanjiLiberal, KanjiRadical, Unicode, Utf8