Uniord.t
Jump to navigation
Jump to search
uniord.t is routine that measures the number of the utf8 character in the Utf8table.
Code
<?php
function uniord($a)
{
$M=strlen($a);
$p=ord($a[0]); if($M==1) return $p;
$p-=194; $p*=64; $p+=ord($a[1]); if($M==2) return $p;
$p-=2050; $p*=64; $p+=ord($a[2]); return $p;
# if($M==1) return ord($a[0]);
# if($M==2) return 64*(ord($a[0])-194)+ord($a[1]);
# if($M==3) return 64*( 64*(ord($a[0])-194)+ord($a[1]))-131200+ord($a[2]);
}
/*
Recovery of number of the Utf8 character encoded with 1,2 or 3 bytes.
Input: string, that consists of single utf8 character.
output: number of this character in the utf8 encoding table,
see [[Utf8table]]
*/
?>
Example
<?php
function uniord($a)
{
$M=strlen($a);
$p=ord($a[0]); if($M==1) return $p;
$p-=194; $p*=64; $p+=ord($a[1]); if($M==2) return $p;
$p-=2050; $p*=64; $p+=ord($a[2]); return $p;
# if($M==1) return ord($a[0]);
# if($M==2) return 64*(ord($a[0])-194)+ord($a[1]);
# if($M==3) return 64*( 64*(ord($a[0])-194)+ord($a[1]))-131200+ord($a[2]);
}
/*
Recovery of number of the Utf8 character encoded with 1,2 or 3 bytes.
Input: string, that consists of single utf8 character.
output: number od this character in the utf8 encoding table.
*/
echo uniord('<')," ", uniord('く'), " ", uniord('〈'),"\n";
echo uniord('〈')," ", uniord('ㄍ'), " ", uniord('巛'),"\n";
echo uniord('⽊')," ", uniord('林'), " ", uniord('森'),"\n";
echo uniord('女')," ", uniord('奻')," ", uniord('姦'),"\n";
echo uniord('ロ')," ", uniord('日')," ", uniord('目'),"\n";
?>
Output:
60 12367 12296 12296 12557 24027 12106 26519 26862 22899 22907 23014 12525 26085 30446
Check by table Utf8table.
<?php include "unichr.t"; echo unichr(98), unichr(12450); ?>
output:
bア
Analogies
https://www.php.net/manual/en/function.ord.php#42778 User Contributed Notes 4 notes 8 years ago (2013) As ord() doesn't work with utf-8, and if you do not have access to mb_* functions, the following function will work well:
<?php
function ordutf8($string, &$offset) {
$code = ord(substr($string, $offset,1));
if ($code >= 128) { //otherwise 0xxxxxxx
if ($code < 224) $bytesnumber = 2; //110xxxxx
else if ($code < 240) $bytesnumber = 3; //1110xxxx
else if ($code < 248) $bytesnumber = 4; //11110xxx
$codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);
for ($i = 2; $i <= $bytesnumber; $i++) {
$offset ++;
$code2 = ord(substr($string, $offset, 1)) - 128; //10xxxxxx
$codetemp = $codetemp*64 + $code2;
}
$code = $codetemp;
}
$offset += 1;
if ($offset >= strlen($string)) $offset = -1;
return $code;
}
?>
$offset is a reference, as it is not easy to split a utf-8 char-by-char. Useful to iterate on a string:
<?php
$text = "abcàê߀abc";
$offset = 0;
while ($offset >= 0) {
echo $offset.": ".ordutf8($text, $offset)."\n";
}
/* returns:
0: 97
1: 98
2: 99
3: 224
5: 234
7: 223
9: 8364
12: 97
13: 98
14: 99
*/
?>
Feel free to adapt my code to fit your needs.
References
https://www.php.net/manual/en/intlchar.ord.php
IntlChar::ord — Return Unicode code point value of characte
Keywords
Japanese, Kanji, mb_str_split.t, PHP, SomeUtf8, SomeUtfH, uniord.t, Unicode, unichr.t, Utf8, UtfH, Utf8table