Nichiyoubi

From TORI
Jump to navigation Jump to search

Nichiyoubi is set of the following 60 words:
01.000 にちようび
02.001 にちようび
03.002 にちよう⽇
04.003 にちよう⽈
05.004 にちよう日
06.005 にちよう曰
07.010 にち曜び
08.011 にち曜び
09.012 にち曜⽇
10.013 にち曜⽈
11.014 にち曜日
12.015 にち曜曰
13.100 ⽇ようび
14.101 ⽇ようび
15.102 ⽇よう⽇
16.103 ⽇よう⽈
17.104 ⽇よう日
18.105 ⽇よう曰
19.110 ⽇曜び
20.111 ⽇曜び
21.112 ⽇曜⽇
22.113 ⽇曜⽈
23.114 ⽇曜日
24.115 ⽇曜曰
25.200 ⽈ようび
26.201 ⽈ようび
27.202 ⽈よう⽇
28.203 ⽈よう⽈
29.204 ⽈よう日
30.205 ⽈よう曰
31.210 ⽈曜び
32.211 ⽈曜び
33.212 ⽈曜⽇
34.213 ⽈曜⽈
35.214 ⽈曜日
36.215 ⽈曜曰
37.300 日ようび
38.301 日ようび
39.302 日よう⽇
40.303 日よう⽈
41.304 日よう日
42.305 日よう曰
43.310 日曜び
44.311 日曜び
45.312 日曜⽇
46.313 日曜⽈
47.314 日曜日
48.315 日曜曰
49.400 曰ようび
50.401 曰ようび
51.402 曰よう⽇
52.403 曰よう⽈
53.404 曰よう日
54.405 曰よう曰
55.410 曰曜び
56.411 曰曜び
57.412 曰曜⽇
58.413 曰曜⽈
59.414 曰曜日
60.415 曰曜曰


The numbers at the beginning each line are not considered as parts of these words.
These numbers are added for testing and for the references.
Thes words are interpreted as elements of Japanese language.

In addition, term Nichiyoubi may refer to the each of these words, as well as to the English word Sunday and/or to its equivalents in other languages (Domingo, Dimanche, Воскресенье, etc.).

Characters involved and the confusion

Some realizations of word Nichiyoubi may look similar (and with some software, even identical). Viewing the source of this article, one can see, that all the words in the Table are different.

The unicode charters of Nichi, id est,
X2F47 [1]
X2F48 [2]
X65E5 [3]
X66F0 [4]
are difficult to distinguish from each other. The goal of the dictionary is analysis of the practical Japanese; so, these characters are interpreted as possible components of word Nichiyoubi.

In addition, the Table uses the following characters:
X3046 [5]
X3061 [6]
X306B [7]
X3072 [8]
X3073 [9]
X3088 [10]
X3099 [11]
X66DC [12]
The last eight, by themselves, seem to cause no confusion,
except similarities of X3073
with combination X3072 and X3099 .
Combination of two characters X3072 and X3099 , id est, , can be considered as an equivalent of .
This resemblance increases number of Japanese words, qualified as Nichiyoubi, from 50 to 60.

The confusion apply not only to Humans. Some software consider combination of tho characters X3072 X3099 (び) as equivalent of single character X3073 . Mediawiki is not an exception: An attempt to access article with name consisting of the two characters &#X3072 and (id est, び) sends to article .

Generation

The table of elements of set Nichiyoubi in the preamble is generated with the following PHP program:

<?php
$a=array("にち","⽇","⽈","日","曰");
$b=array("よう","曜");
//$c=array("び","⽇","⽈","日","曰");
$c=array("び","び","⽇","⽈","日","曰");
for($i=0;$i<5;$i++)
{ printf("%1d %s <br>\n",$i,$a[$i]); }
echo "<br>\n";
for($j=0;$j<2;$j++)
printf("%1d %s <br>\n",$j,$b[$j]);
echo "<br>\n";
for($k=0;$k<6;$k++)
printf("%1d %s <br>\n",$k,$c[$k]);
$m=0;
for($i=0;$i<5;$i++)
for($j=0;$j<2;$j++)
for($k=0;$k<6;$k++)
{ $m++;
printf("%02d.%1d%1d%1d [[%s%s%s]]<br>\n",$m, $i,$j,$k, $a[$i],$b[$j],$c[$k]);
}
?>

The preface (list of characters involved), also generated by this program for the testing, is omitted in the definition. Of course, it is supposed to be reproduced at the execution.

For the correct execution, the program may have need to be copied from the source rather than from the view above.

Need

Nichiyoubi defined in this article appears as a class of equivalence, that seems to be useful for the analysis of Japanese texts.

All the words listed in the preamble are necessary, in order to avoid the encoding errors. For example search (for replacing, counting, warning, efc.) for にちよう⽇ (word number 02.001 in the list) will not found words in lines 03.002, 04.003, 05.004; and similar to other. Even worse, some software silently, without any warning, replaces some Unicode characters (mainly, KanjiConfudal) to other. This seems to be not a case of Nichi characters, but the goal is elaboration of general method. For this reason, in the original of the program, the unicode characters are accused by their hexagonal numbers. These numbers appear as ascii words, and they seems to be not altered by a software (touch a wood!). At the processing with HTML, they are converted for the visual reception by Humans; but this conversion can be suppressed removing the leading "&#" or tailing ";" from the number. One needs to open the source of this article to see, how the numbers of characters are written.

References

  1. https://util.unicode.org/UnicodeJsps/character.jsp?a=2F47 2F47 KANGXI RADICAL SUN Han Script id: allowed confuse: ..
  2. https://util.unicode.org/UnicodeJsps/character.jsp?a=2F48 2F48 KANGXI RADICAL SAY Han Script id: allowed confuse: ..
  3. https://util.unicode.org/UnicodeJsps/character.jsp?a=65E5 65E5 CJK UNIFIED IDEOGRAPH-65E5 Han Script id: restricted confuse: ..
  4. https://util.unicode.org/UnicodeJsps/character.jsp?a=66F0 66F0 CJK UNIFIED IDEOGRAPH-66F0 Han Script id: restricted confuse: ..
  5. https://util.unicode.org/UnicodeJsps/character.jsp?a=3046 3046 HIRAGANA LETTER U Hiragana Script id: restricted confuse: none ..
  6. https://util.unicode.org/UnicodeJsps/character.jsp?a=3061 3061 HIRAGANA LETTER TI Hiragana Script id: restricted confuse: none ..
  7. https://util.unicode.org/UnicodeJsps/character.jsp?a=306B 306B HIRAGANA LETTER NI Hiragana Script id: restricted confuse: none ..
  8. https://util.unicode.org/UnicodeJsps/character.jsp?a=3072 3072 HIRAGANA LETTER HI Hiragana Script id: restricted confuse: none ..
  9. https://util.unicode.org/UnicodeJsps/character.jsp?a=3073 3073 HIRAGANA LETTER BI Hiragana Script id: restricted confuse: ひ + ゙ ..
  10. https://util.unicode.org/UnicodeJsps/character.jsp?a=3088 3088 HIRAGANA LETTER YO Hiragana Script id: restricted confuse: none ..
  11. https://util.unicode.org/UnicodeJsps/character.jsp?a=3099 3099 COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK Nonspacing Mark id: allowed confuse: none ..
  12. https://util.unicode.org/UnicodeJsps/character.jsp?a=66DC 66DC CJK UNIFIED IDEOGRAPH-66DC Han Script id: restricted confuse: none ..

Keywords

Confusion, Japanese, Hiragana, Nichi, Nichiyoubi, Unicode, X2F47 , X2F48 , X306B , X3061 , X3088 , X3046 , X3072 , X3073 , X3099 , X65E5 , X66F0