Interlingua MT:Translation of Numbers
Topics:
Number systems
Grammar for numbers
Parsing
*
Interlingua MT:
Translation of Numbers
Semantic processing
Generation
MT pyramid
(revisited)
Source language
Target language
Interlingua
Transfer: deeper rep.Transfer: semantic rep.
Transfer: functional structureTransfer: phrase structure
Direct translation: word for word
translation
No transfer process
needed for interlingua
*
Interlingua MT
Interlingua
Language1
Language2
Language3
Others .
Advantage of interlingua:Adding a new language needsonly one more language pair:new language Interlingua
*
*
What is interlingua?
An interlingua is supposed to be a universal
representation for What?
meaning, of course
but what is meaning?
Under the circumstance of no clear meaning for
meaning, we may describe interlingua as
a universal representation for what can be conveyed
through human language communication
Question:
What can be conveyed by our languages?
How to design an interlingua?
Any clear idea about it? No
What we are sure to know is its
universality and
versatility
Think about the following
ontology of human knowledge
conceptions of what we know and can express viaspeech
ontology of objects in the world and in our languages
ontology of events
ontology of words, etc.
Any example to helpus understand it anybetter?
*
Interlingua MT for numbers
Interlingua:
*
Values
English numbers
Chinese numbers
Others .
Arabic numbers
Used as universalArerparbeiscenntuamtiobnerfsorvalues
Number systems
Decimal numbers
Arabic numbers
Yes
Chinese numbers ?
<= 10,000, yes> 10,000, still?
English numbers ?
<= 1,000, yes> 1,000, still?
The distinctionbetween the two canbe exemplified bythe difficulties inconverting ortranslating betweenthem.
Basically yes,but with quitesome variation!
What is thedifference?
*
*
What define a number system?
Base
the set of digits (or, base symbols) used
the cardinality of the digit set (i.e., the number of digits)
decimal numbers
base 10
digits: {0,
1, 2, 3, 4, 5, 6, 7, 8, 9}
each digit has its own digit value.
Position
the place where a digit shows up.23388
each position has its position value:
|Base|Pos43210
What value does a digit represent?
2
3
388
0
4321
8100
8101
3102
3103
2104
Digit value
Position value
Digit value
A digit represents different value when showing up in
different position
Position value
*
Digit x |Base|Pos
What is the value of a number?
All numbers value = sum of all its digits values.E.g.,
23,388
= 2104 +
3103 +
3102 +
8101 +
8100
= 23,388
Hei! So trivial!
What kind of game are you
playing?
10
Let us play with binary numbers
Base 2
Digits=
{0, 1} (i.e., only 0 and 1 appear in a number)
Still trivial?
All computers play such a game.
How about numbers on other bases?
*
11,111 =124+
123+
122+
121+
=120
31
*
Octal numbers
{0, 1, 2, 3, 4, 5, 6, 7}
Base 8
Digits =
Numbers:
0,1,2,3,4,5,6,7,
10,11,12,13,14,15,16,17,
20,21,22,23,24,25,26,27,
30,31,32,33,34,35,36,37,
3578=
=?382
+ 581
+
780
=
23910
*
Hexadecimal numbers
{0, 1, 2, , 9, A, B, C, D, E, F}
Base 16
Digits =
Numbers:
0,1,2,9,A,B,C,D,E,F
10,11,12,19,1A,1B,1C,1D,1E,1F
20,21,22,29,2A,2B,2C,2D,1E,2F
35716=?
=3162+ 5161+7160=85510
*
Chinese numbers
Base 10, basically
Digits = {, , , , , , , , , }
Another set of digits: {, , , }
Position
Positions in Chinese numbers are explicitly
expressed
Positions: {}, , , , , ,
Position values: 1, 10, 102, 103, 104, 108, 1012
E.g.,
= 5103 + 6102 + 7101 + 8100
= 5,67810
A grammar for Chinese numbers
G > Digits
S > {G} {G}
B > G
B > G S
B > G G
Q > G
Q > G B
Q > G SQ > G G
W> Q/B/S/G
W >
W >
W >
W>
Q/B/S/G
Q/B/S/G QQ/B/S/G BQ/B/S/G S
G
CCoonnjjuunnccttiiioonn,,nnoottzzeerroo!!
*
Large numbers in Chinese
W > Q/B/S/G
W > Q/B/S/G Q
W > Q/B/S/G BW > Q/B/S/G SW > Q/B/S/G G
Q/B/S/G
Q/B/S/G
Z > Q/B/S/G
Z > Q/B/S/G Y
Z > Q/B/S/G YZ > Q/B/S/G WZ > Q/B/S/G QZ >B
Z >G
Problem:
Ambiguity in analysis
*
Y>Q/B/S/G
Y>Q/B/S/GW
Y>Q/B/S/G W
Y>Q/B/S/G Q
Y>Q/B/S/G B
Y>Q/B/S/G G
Solution
W > B/S/G
W > B/S/G Q
W > B/S/G BW > B/S/G SW > B/S/G G
Y > B/S/G
Y > B/S/G WQ
Y > B/S/G WY > B/S/G QY > B/S/G BY > B/S/G G
YQ > Q
YQ > Q WQYQ > Q W
Z > Q/B/S/G
Z > Q/B/S/G YQ
Z > Q/B/S/G YZ > Q/B/S/G WZ > Q/B/S/G QZ > Q/B/S/G BZ > Q/B/S/G G
*
WQ>Q
WQ>QQ
WQ>Q B
WQ>Q S
WQ>Q G
YQ>Q Q
YQ>Q B
YQ>Q G
*
Chinese numbers => values
Two steps:
Syntactic analysis
Parsing: to derive a syntactic tree (called parse tree)for an input sentence / number.
Result: a phrase structure tree.
Semantic interpretation:
To convert the parse tree
into a semantic / meaning representation,
namely, a value.
Semantic rules for interpretation
We need to define a semantic rule for each grammarrule to specify
how a phrase structure under the grammar rule is
interpreted into a meaning representation, i.e.,
how to convert a syntactic structure into meaning.
Z > Q/B/S/G Y
sem(Z)
= sem(Q/B/S/G Y)
=
sem(Q/B/S/G)
x sem() +
sem(Y)
*
Example: parsing
Q
G
Z
B
G
Y
x
x
+
x
x
20
Example: semantic interpretation
B
x
x
+
x
x
*
=3
G =3
Q
=3103
=103=1012
=6
=6102
=102=108
Y=61010
G =6
Z =31015 + 61010 = 3,000,060,000,000,000
=31015
*
Generation (i):Head
Given an Arabic number, generate its Chinese counterpart
Format: N = head * pos + tail
Denoted as: head(N, pos) and tail(N, pos), respectively
Given an input number X, how generate it? Heads and then tails
1012|8|4
1012|8|4
integer division!
remainder!
head(X,||) = X /
tail(X,||) = X %
Generate its head
gen(head(X,||))) a Q-number < 104Generate its tailgen(tail(X,||)) a number < 1012|8|4*Generation (ii): Tail < 104Generating a Q-numbers X< 104head(X,//) = X /tail(X,//) = X %103|2|1103|2|1Generate its headgen(head(X,//))) a Q-number < 10Generate its tailgen(tail(X,//)) a number < 103|2|1Generation: example1. X=123,456,789,123,456,789gen(X) = gen(head(X,) gen(tail(X,)= gen(123,456) gen(789,123,456,789)2. X=123,456gen(X) ==gen(X,) gen(tail(X,))gen(12) gen(3,456)3. X=12gen(x) ==gen(head(X,) gen(tail(x,))gen(1) gen(2)4. gen(1) = gen(2) = *Example: generation of conjunctiongen(3,000,060,000,000,000):head = 3000tail = 60,000,000,000gen(3000):head = 3tail = 0gen(60,000,000,000):head = 600tail = 0gen(600):head = 6 tail = 0*For Chinese, any time whentail is less than 1/10 of pos,insert a conjunction tothe output.English part?Interlingua:valuesEnglish numbersChinese numbersOthers .Arabic numbers???**Grammar for English numberExerciseDesign a grammar for English numbers, coveringthe range [0, 1,000,000,000,000,000-1], andUse it to analyse the English number for123,456,789,123 (or 123,456,789,123,456)Design the generation procedure for Englishnumbers and illustrate how it works for a realEnglish number, e.g., 123,456.*HintsIn lecture on interlingua, the following was given as the starting point for your design of the grammar for English numbers for HW2:D0 –> {zero}
D > {one, two , .. nine}
D > {ten, eleven, nineteen}
T > {twenty, ninety}
and then five rules:H- > D0 | D | D | T | T D
subsuming the following:
H- > D0
H- > D
H- > D
H- > T
H- > T D
to cover numbers under 100. (Do not add any extra symbol in a rule such as T > D + D, which is wrong!)
Do not forget N > H-, for N is our axiom (just like S for sentence). So are rules for Th-, M-, B-, etc.
Following the above fashion, we can haveTh- > D hundred {H-} for numbers in the range [100, 999]. As mentioned in class that people actually say twenty hundred and even ninety nine hundred, we can extend this rule into the following by replacing D with H-:
Th- > H- hundred
Th- > H- hundred H-
Th- > H- hundred and H- (For British English)
Originally, Th- is defined to cover [100, 999]. Given the larger coverage of H- that that of D, the Th- rules have certain overgeneration to generate number beyond 999. But conceptually, simply thinking of Th- as for number under 1000 is fine for other rules.
You may merge them into one line (NOT one rule!) as:
Th- > H- hundred{and}{H-}
where {} means optional. Please check if any number in this range [100, 999] missing before moving on to rules for M-, B-, etc.
M- > [] thousand
M- > Th- thousand Th-
M- > Th- thousand and H-
M- > Th- thousand H-
M- > Th- billion
..
[]billion []million []thousand []
gen(23)
=gen(2,)+gen(3)
=gen(2)+gen(3)
gen(23)
=gen(2,tens)gen(3)
=twenty gen(3)
gen(19)19
Reviews
There are no reviews yet.