[SOLVED] Interlingua MT: Translation of Numbers

$25

File Name: Interlingua_MT:__Translation_of_Numbers.zip
File Size: 367.38 KB

5/5 - (1 vote)

Interlingua MT:Translation of Numbers

Topics:
Number systems
Grammar for numbers
Parsing

*

Interlingua MT:
Translation of Numbers
Semantic processing
Generation

MT pyramid
(revisited)
Source language
Target language

Interlingua
Transfer: deeper rep.Transfer: semantic rep.
Transfer: functional structureTransfer: phrase structure

Direct translation: word for word
translation
No transfer process
needed for interlingua
*

Interlingua MT
Interlingua
Language1
Language2
Language3
Others .
Advantage of interlingua:Adding a new language needsonly one more language pair:new language Interlingua
*

*
What is interlingua?
An interlingua is supposed to be a universal

representation for What?
meaning, of course
but what is meaning?
Under the circumstance of no clear meaning for

meaning, we may describe interlingua as
a universal representation for what can be conveyed

through human language communication
Question:
What can be conveyed by our languages?

How to design an interlingua?
Any clear idea about it? No
What we are sure to know is its

universality and
versatility

Think about the following
ontology of human knowledge
conceptions of what we know and can express viaspeech
ontology of objects in the world and in our languages
ontology of events
ontology of words, etc.

Any example to helpus understand it anybetter?
*

Interlingua MT for numbers

Interlingua:
*
Values
English numbers
Chinese numbers
Others .
Arabic numbers
Used as universalArerparbeiscenntuamtiobnerfsorvalues

Number systems
Decimal numbers
Arabic numbers
Yes
Chinese numbers ?

<= 10,000, yes> 10,000, still?
English numbers ?

<= 1,000, yes> 1,000, still?
The distinctionbetween the two canbe exemplified bythe difficulties inconverting ortranslating betweenthem.

Basically yes,but with quitesome variation!

What is thedifference?
*

*
What define a number system?
Base
the set of digits (or, base symbols) used
the cardinality of the digit set (i.e., the number of digits)
decimal numbers
base 10

digits: {0,

1, 2, 3, 4, 5, 6, 7, 8, 9}
each digit has its own digit value.
Position

the place where a digit shows up.23388
each position has its position value:
|Base|Pos43210

What value does a digit represent?
2
3
388
0
4321
8100
8101
3102
3103
2104
Digit value
Position value
Digit value
A digit represents different value when showing up in

different position
Position value
*
Digit x |Base|Pos

What is the value of a number?
All numbers value = sum of all its digits values.E.g.,

23,388
= 2104 +
3103 +
3102 +
8101 +
8100
= 23,388

Hei! So trivial!
What kind of game are you
playing?
10

Let us play with binary numbers
Base 2
Digits=

{0, 1} (i.e., only 0 and 1 appear in a number)
Still trivial?
All computers play such a game.

How about numbers on other bases?
*
11,111 =124+
123+
122+
121+

=120

31

*
Octal numbers
{0, 1, 2, 3, 4, 5, 6, 7}
Base 8
Digits =

Numbers:
0,1,2,3,4,5,6,7,
10,11,12,13,14,15,16,17,
20,21,22,23,24,25,26,27,
30,31,32,33,34,35,36,37,
3578=
=?382
+ 581
+
780
=
23910

*
Hexadecimal numbers
{0, 1, 2, , 9, A, B, C, D, E, F}
Base 16
Digits =

Numbers:
0,1,2,9,A,B,C,D,E,F
10,11,12,19,1A,1B,1C,1D,1E,1F
20,21,22,29,2A,2B,2C,2D,1E,2F

35716=?
=3162+ 5161+7160=85510

*
Chinese numbers
Base 10, basically

Digits = {, , , , , , , , , }
Another set of digits: {, , , }
Position
Positions in Chinese numbers are explicitly

expressed
Positions: {}, , , , , ,
Position values: 1, 10, 102, 103, 104, 108, 1012
E.g.,

= 5103 + 6102 + 7101 + 8100
= 5,67810

A grammar for Chinese numbers
G > Digits

S > {G} {G}

B > G
B > G S
B > G G

Q > G
Q > G B
Q > G SQ > G G
W> Q/B/S/G
W >
W >
W >
W>
Q/B/S/G
Q/B/S/G QQ/B/S/G BQ/B/S/G S
G
CCoonnjjuunnccttiiioonn,,nnoottzzeerroo!!
*

Large numbers in Chinese
W > Q/B/S/G
W > Q/B/S/G Q
W > Q/B/S/G BW > Q/B/S/G SW > Q/B/S/G G
Q/B/S/G
Q/B/S/G
Z > Q/B/S/G
Z > Q/B/S/G Y
Z > Q/B/S/G YZ > Q/B/S/G WZ > Q/B/S/G QZ >B
Z >G
Problem:
Ambiguity in analysis
*
Y>Q/B/S/G
Y>Q/B/S/GW
Y>Q/B/S/G W
Y>Q/B/S/G Q
Y>Q/B/S/G B
Y>Q/B/S/G G

Solution
W > B/S/G
W > B/S/G Q
W > B/S/G BW > B/S/G SW > B/S/G G
Y > B/S/G
Y > B/S/G WQ
Y > B/S/G WY > B/S/G QY > B/S/G BY > B/S/G G

YQ > Q
YQ > Q WQYQ > Q W
Z > Q/B/S/G
Z > Q/B/S/G YQ
Z > Q/B/S/G YZ > Q/B/S/G WZ > Q/B/S/G QZ > Q/B/S/G BZ > Q/B/S/G G
*
WQ>Q
WQ>QQ

WQ>Q B
WQ>Q S
WQ>Q G

YQ>Q Q
YQ>Q B
YQ>Q G

*
Chinese numbers => values
Two steps:
Syntactic analysis
Parsing: to derive a syntactic tree (called parse tree)for an input sentence / number.
Result: a phrase structure tree.
Semantic interpretation:
To convert the parse tree

into a semantic / meaning representation,
namely, a value.

Semantic rules for interpretation
We need to define a semantic rule for each grammarrule to specify
how a phrase structure under the grammar rule is

interpreted into a meaning representation, i.e.,
how to convert a syntactic structure into meaning.

Z > Q/B/S/G Y
sem(Z)
= sem(Q/B/S/G Y)
=
sem(Q/B/S/G)
x sem() +
sem(Y)
*

Example: parsing

Q
G
Z
B
G
Y
x
x
+
x
x
20

Example: semantic interpretation

B
x
x
+
x
x
*
=3
G =3
Q
=3103

=103=1012
=6
=6102

=102=108
Y=61010
G =6
Z =31015 + 61010 = 3,000,060,000,000,000
=31015

*
Generation (i):Head
Given an Arabic number, generate its Chinese counterpart

Format: N = head * pos + tail
Denoted as: head(N, pos) and tail(N, pos), respectively

Given an input number X, how generate it? Heads and then tails

1012|8|4
1012|8|4
integer division!
remainder!

head(X,||) = X /
tail(X,||) = X %
Generate its head

gen(head(X,||))) a Q-number < 104Generate its tailgen(tail(X,||)) a number < 1012|8|4*Generation (ii): Tail < 104Generating a Q-numbers X< 104head(X,//) = X /tail(X,//) = X %103|2|1103|2|1Generate its headgen(head(X,//))) a Q-number < 10Generate its tailgen(tail(X,//)) a number < 103|2|1Generation: example1. X=123,456,789,123,456,789gen(X) = gen(head(X,) gen(tail(X,)= gen(123,456) gen(789,123,456,789)2. X=123,456gen(X) ==gen(X,) gen(tail(X,))gen(12) gen(3,456)3. X=12gen(x) ==gen(head(X,) gen(tail(x,))gen(1) gen(2)4. gen(1) = gen(2) = *Example: generation of conjunctiongen(3,000,060,000,000,000):head = 3000tail = 60,000,000,000gen(3000):head = 3tail = 0gen(60,000,000,000):head = 600tail = 0gen(600):head = 6 tail = 0*For Chinese, any time whentail is less than 1/10 of pos,insert a conjunction tothe output.English part?Interlingua:valuesEnglish numbersChinese numbersOthers .Arabic numbers???**Grammar for English numberExerciseDesign a grammar for English numbers, coveringthe range [0, 1,000,000,000,000,000-1], andUse it to analyse the English number for123,456,789,123 (or 123,456,789,123,456)Design the generation procedure for Englishnumbers and illustrate how it works for a realEnglish number, e.g., 123,456.*HintsIn lecture on interlingua, the following was given as the starting point for your design of the grammar for English numbers for HW2:D0 –> {zero}
D > {one, two , .. nine}
D > {ten, eleven, nineteen}
T > {twenty, ninety}
and then five rules:H- > D0 | D | D | T | T D
subsuming the following:
H- > D0
H- > D
H- > D
H- > T
H- > T D
to cover numbers under 100. (Do not add any extra symbol in a rule such as T > D + D, which is wrong!)
Do not forget N > H-, for N is our axiom (just like S for sentence). So are rules for Th-, M-, B-, etc.
Following the above fashion, we can haveTh- > D hundred {H-} for numbers in the range [100, 999]. As mentioned in class that people actually say twenty hundred and even ninety nine hundred, we can extend this rule into the following by replacing D with H-:
Th- > H- hundred
Th- > H- hundred H-
Th- > H- hundred and H- (For British English)
Originally, Th- is defined to cover [100, 999]. Given the larger coverage of H- that that of D, the Th- rules have certain overgeneration to generate number beyond 999. But conceptually, simply thinking of Th- as for number under 1000 is fine for other rules.

You may merge them into one line (NOT one rule!) as:
Th- > H- hundred{and}{H-}
where {} means optional. Please check if any number in this range [100, 999] missing before moving on to rules for M-, B-, etc.

M- > [] thousand
M- > Th- thousand Th-
M- > Th- thousand and H-
M- > Th- thousand H-
M- > Th- billion

..

[]billion []million []thousand []

gen(23)
=gen(2,)+gen(3)
=gen(2)+gen(3)
gen(23)
=gen(2,tens)gen(3)
=twenty gen(3)

gen(19)19

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] Interlingua MT: Translation of Numbers
$25