Dump.t

From TORI
Jump to: navigation, search

Dump.t is PHP program that analyses the content of the input string.

The goal is to identify the Utf8 characters that either do not have standard (supported by the software) pictures, or have pictures that look similar to pictures of other characters and cannot be distinguished by naked eye.

Code

<?php
include "unichr.t";
include "uniord.t";
include "mb_str_split.t";

//dump.t analyses the content of a sttring.
//The string is interpreted as sequense of Utf8 characters
// files unichr.t, uniord.t, mb_str_split.t
// should be loaded in the working directory.
// Usage:
// php dump.t "any абракадабра and だからも in any language(s)"

$a=$argv[1];
echo "$a\n";
$N=strlen($a);
echo "The array has $N bytes; here is its splitting:\n";

for($n=0;$n<$N;$n++){printf("%02x ",ord($a[$n]) );}
echo "\n";
$b = mb_str_split($a);
var_dump($b);
$M=count($b);
for($m=0;$m<$M;$m++)
{
printf("\n");
$c=$b[$m];
$u=uniord($c);
printf("Unicode character number %05d id est, [[X%04X]]\n",$u,$u);
$d=strlen($c);
echo "Picture: $c ; uses $d bytes. These bytes are:\n";
for($n=0;$n<$d;$n++) printf("x%2X ",ord($c[$n]));
printf("in the hexadecimal representation and\n");
for($n=0;$n<$d;$n++) printf("%3d ",ord($c[$n]));
printf("in the decimal representation\n");
}
?>

References

https://kanjialive.com/214-traditional-kanji-radicals/#%E2%BC%A6 The 214 traditional kanji radicals and their variants

Keywords

Du.t, Dump.t, Japanese, Kanji, PHP, SomeH, Unicode, Utf8, Utf8table, UtfH,