INF5820 Language technological applications H2010 Jan Tore Lønning jtl@ifi.uio.no
Maskinoversettelse INF 5820 H2008 Forelesning 2
Machine Translation 1. Some examples 2. Why is machine translation a problem? 3. Traditional approaches: 1. Direct 2. Interlingua 3. Transfer 4. Empirical approaches: 1. SMT 2. Example-based MT (EBMT) 5. The LOGON approach
1. Realskolealgoritmen S N B E V Pr V PP H D 3p E O A U F Jenta fra byen har gitt ham noen røde epler Mädchen von Stadt haben geben er einige rot Apfel Das Mädchen von der Stadt hat gegeben ihm einige rote Äpfel gegeben 1. Identify verb, syntactic function, case 2. And morphosyntactic features: definiteness, number, person, form, tense, 3. Translate the lexemes 4. Properties of the target lexemes: gender, arguments, agreement 5. Inflection: Case, number, person, gender, def., tense, agr. 6. Word order
1. Direct translation Main idea: Translate words! Bilingual dictionary Some morphological analysis Two steps: Determine the words Determine the word order (Similar to statistical MT) J&M: Decision list algorithm
2. Interlingua A universal meaning representation language (lingua franca) Analyse the source language sentence resulting in an interlinguag representation From this generate sentence in target language
2. Interlingua strength Translation between many languages. One analysis module and one generation module per languages Example 17 languages: Direct 17*16 modules (=272) Interlingua 2*17 (=34) Language18: Direct +(2*17) Interlingua +2
3. Transfer Problem for interlingua: A language independent meaning representation Transfer approach: Language specific representations Contrast between pair of languages as transfer rules Syntactic transfer: Extends the direct approach with a syntactic analysis Semantic transfer Semantic representations, but language independent
Alternative strategies Vauquoistriangel interlingua Semantic transfer Syntaktic transfer Norsk setning Ord-for-ord English sentence
Machine Translation 1. Some examples 2. Why is machine translation a problem? 3. Traditional approaches: 1. Direct 2. Interlingua 3. Transfer 4. Empirical approaches: 1. SMT 2. Example-based MT (EBMT) 5. The LOGON approach
Example-based MT No: Jenta har lest lekser i en time. Eng:? Eksempler: Jenta har spist et eple hver dag The girl has eaten an apple a day Per hadde lest lekser Per had studied Kari sang i en time. Kari sang for an hour. Not necessarily constituents
SMT Figure 25.8
Machine Translation 1. Some examples 2. Why is machine translation a problem? 3. Traditional approaches: 1. Direct 2. Interlingua 3. Transfer 4. Empirical approaches: 1. SMT 2. Example-based MT (EBMT) 5. The LOGON approach
The LOGON-project Maskinoversettelse norsk engelsk Mange områder av språkteknologi trengs: Samvirke i en demonstrator Likheter og forskjeller mellom norsk og andre språk Turisttekster/turbeskrivelser Høykvalitet, (begrenset dekning) 2003-2007
Alternative strategier interlingua semantikk syntaks Norsk setning Ord-for-ord Engelsk setning
MT strategies (symbolic) semantic (syntactic) Norwegian sentence English sentence
Basis: Transferbasert oversettelse Underbestemt semantisk rep. av norsk 2. Transfer Underbestemt semantisk rep. av engelsk 1. Analyse LFG-basert 3. Generering HPSG-basert Norsk setning Engelsk setning
2.2 Flertydighet 1. Analysis 2. Transfer 3. Generation Hvordan velge den rette eller beste på hvert trinn?
< Toppen er luftig, og har en utrolig utsikt! (83) --- 2 x 24 x 12 = 12 > the top is airy and has an incredible view [85.9] <0.70> (1:0:0). > the summit is airy and has an incredible view [87.4] <1.00> (1:4:0). > the top is breezy and has an incredible view [87.7] <0.46> (1:6:0). > the top is airy and has an unbelievable view [88.9] <0.70> (1:1:0). > the peak is airy and has an incredible view [89.1] <0.96> (1:2:0). > the summit is breezy and has an incredible view [89.1] <0.66> (1:10:0). > the summit is airy and has an unbelievable view [90.3] <1.00> (1:5:0). > the top is breezy and has an unbelievable view [90.7] <0.46> (1:7:0). > the peak is breezy and has an incredible view [90.8] <0.66> (1:8:0). > the peak is airy and has an unbelievable view [92.0] <0.96> (1:3:0). > the summit is breezy and has an unbelievable view [92.1] <0.66> (1:11:0). > the peak is breezy and has an unbelievable view [93.8] <0.66> (1:9:0). = 64:19 of 83 {77.1+22.9}; 58:9 of 64:19 {90.6 47.4}; 55:9 of 58:9 {94.8 100.0} @ 64 of 83 {77.1} <0.51 0.67>.
< De slipper å bære. (70) --- 3 x 4 x 9 = 6 [9] > they do not have to carry something [40.6] <0.84> (0:0:1). > you do not have to carry something [41.8] <0.53> (1:0:1). > those do not have to carry something [51.6] <0.53> (2:1:1). > they don't have to carry something [55.2] <0.80> (0:0:0). > you don't have to carry something [65.8] <0.43> (1:0:0). > those don't have to carry something [66.3] <0.43> (2:1:0). = 57:13 of 70 {81.4+18.6}; 51:6 of 57:13 {89.5 46.2}; 48:6 of 51:6 {94.1 100.0} @ 54 of 70 {77.1} <0.53 0.69>.
Maskinoversettelse 1. Hva er maskinoversettelse 2. Hvorfor er det vanskelig? 3. Tradisjonelle tilnærminger: 1. Direkte 2. Interlingua 3. Transfer 4. Empiriske tilnærminger: 1. Eksempelbasert MT (EBMT) 2. Statistisk MT (SMT) 5. LOGON-prosjektet 6. Evaluering 7. Maskinoversettelse i praksis 8. Litt historie
Historien 1950-årene: stor optimisme (FAHQT) 1960-årene: for vanskelig Bar-Hillel ALPAC-rapporten 1980-årene-fornyet interesse: Japan EU, Eurotra
Vår tid (1992 ) Anvendelser Hyllevare for PC-er WWW Interaktive oversettelsesverktøy Nye markeder: Kina Teori Taleoversettelse, eks. VerbMobil, tysk prosjekt Afghanistan, jfr. SMT, EMT
SMTs tidsalder Fra 1990 Med som et alternativ på slutten av VerbMobil På markedet fra ca. 2003 Google: SMT fra ca 2005 Overbevisende kvalitet Mange språkpar Men forutsigbare feil