Easier computer-assisted translation, even for Maltese
The European Commission announced it is going a step further its efforts to foster multilingualism as a key part of European unity in diversity. The Commission's collection of about one million sentences and their high quality translations in 22 of the...
The European Commission announced it is going a step further its efforts to foster multilingualism as a key part of European unity in diversity. The Commission's collection of about one million sentences and their high quality translations in 22 of the 23 official EU languages -including Maltese - is the biggest ever collection in so many languages and is now freely available.
This kind of data is highly sought after by developers of machine translation systems in which automatic translation software "learns" from manually translated texts how words and phrases are correctly and contextually translated. The data can also help the development of other linguistic software tools such as grammar and spell checkers, online dictionaries and multilingual text classification systems.
Such tools are also being researched and developed in Malta thanks to a concerted effort between the University of Malta's Department of Artificial Intelligence and the Institute of Linguistics, the Commission said.
"Through this initiative the European Commission intends to boost human language technologies, support multilingualism and make computer-assisted translation easier, cheaper and more accessible. Citizens belonging to the smaller linguistic communities will have an easier access to documents and web pages only available in the most used languages," commented Leonard Orban, Commissioner for Multilingualism.
Janez Potoènik, European Commissioner for Science and Research, said "This unique collection of language data contributes to the creation of a new generation of software tools for human language processing and helps foster the competitiveness of the language industry, which is already one of the fastest growing industries in the EU."
The EU institutions have more multilingual texts than any other organisation because of the requirements that EU law exist in each of its 23 official languages. Their translation services work with 253 possible language pair combinations and produce around 1.5 million translated pages a year.
Whereas large amounts of translations of English or French texts can be found on the internet, such resources are scarce for languages such as Latvian or Romanian, and they are practically nonexistent for the combination of two languages for which few resources exist.
Therefore, the Commission, through cooperation between its translators and its in-house scientists, is releasing large collections of sentences from legal documents covering technical, political and social issues which are available in 22 languages. In this translation repository, it is possible to find sentences with their equivalent in all other official languages.
Only Irish translations are not yet available. This release of language data is a good example of the Commission's open policy of re-use of its information resources and follows the opening of the EU's documentary and terminological databases Eur-Lex and IATE.
The 7th Framework programme for research and development - in its Information and Communication Technologies strand - supports research on machine translation and other language-related technologies.
More information at http://langtech.jrc.it/DGT-TM.html .
This kind of data is highly sought after by developers of machine translation systems in which automatic translation software "learns" from manually translated texts how words and phrases are correctly and contextually translated. The data can also help the development of other linguistic software tools such as grammar and spell checkers, online dictionaries and multilingual text classification systems.
Such tools are also being researched and developed in Malta thanks to a concerted effort between the University of Malta's Department of Artificial Intelligence and the Institute of Linguistics, the Commission said.
"Through this initiative the European Commission intends to boost human language technologies, support multilingualism and make computer-assisted translation easier, cheaper and more accessible. Citizens belonging to the smaller linguistic communities will have an easier access to documents and web pages only available in the most used languages," commented Leonard Orban, Commissioner for Multilingualism.
Janez Potoènik, European Commissioner for Science and Research, said "This unique collection of language data contributes to the creation of a new generation of software tools for human language processing and helps foster the competitiveness of the language industry, which is already one of the fastest growing industries in the EU."
The EU institutions have more multilingual texts than any other organisation because of the requirements that EU law exist in each of its 23 official languages. Their translation services work with 253 possible language pair combinations and produce around 1.5 million translated pages a year.
Whereas large amounts of translations of English or French texts can be found on the internet, such resources are scarce for languages such as Latvian or Romanian, and they are practically nonexistent for the combination of two languages for which few resources exist.
Therefore, the Commission, through cooperation between its translators and its in-house scientists, is releasing large collections of sentences from legal documents covering technical, political and social issues which are available in 22 languages. In this translation repository, it is possible to find sentences with their equivalent in all other official languages.
Only Irish translations are not yet available. This release of language data is a good example of the Commission's open policy of re-use of its information resources and follows the opening of the EU's documentary and terminological databases Eur-Lex and IATE.
The 7th Framework programme for research and development - in its Information and Communication Technologies strand - supports research on machine translation and other language-related technologies.
More information at http://langtech.jrc.it/DGT-TM.html .