Studying how the human brain processes language is a prerequisite for the design of Human Language Technology, Ray Fabri and Albert Gatt tell Josianne Facchetti.

Talking machines are not a futuristic concept. ATM machines talk to customers, giving instructions when drawing cash or doing other trans­actions, while Google has the facility to translate text from one language to another. People with a speech impediment use speech synthesis – the translation of text into speech – to communicate.

Other software currently being developed is STT – this includes transforming speech into text, automatic summarisation, voice recognition, information retrieval, automatic transcription and access to information through spoken human-computer dialogue. These technological systems are made available through internet search engines, mobile phones and other digital devices.

This technology exists thanks to Human Language Technology. HLT is an area of study that involves the design of intelligent computer systems that process human language, such as the ones mentioned.

A prerequisite for this is the study of how the human brain develops and processes language, such as in the use of vocabulary and word and sentence structure.

Although work on HLT has been ongoing for many years, some of these systems have not yet been developed to their full potential. I recall how, when I was teaching at a university in China, some students would first write their assignments in Chinese then use automatic translators to translate them into English. Unfortunately for them, the automatic translator could not be used for a large amount of text – the result would often be a senseless mass of text and an instant fail.

In Malta, research on HLT is being conducted by the Institute of Linguistics and the Department of ICT at the University. Members of these two institutes have embarked on collaborative and interdisciplinary projects, including the creation of the first electronic lexicon of Maltese and a corpus of Maltese. This project to create a large repository of text and speech in digital format started in 1997.

Ray Fabri and Albert Gatt from the Institute of Linguistics explain that before creating any language-sensitive software based on HLT, tools are needed such as the lexicon and the 100-million word corpus. These act as building blocks for the creation of HLT software.

“The corpus is used both for linguistic research and to develop machine learning methods to extract patterns from the linguistic data to learn grammar rules and word categories. Studying language gives insights into the way the brain works and way we think.

“My main interest is the way language is processed by humans. I take a psycholinguistic perspective and I take those insights and implement them in computational models. The big challenge is creating a machine that uses language the way humans use language,” says Gatt.

The two scholars explain and demonstrate how this corpus, which includes literature, academic text, press text, speeches and text collected from the web, works.

Another ongoing project by Fabri, Gatt and Claudia Borg is the study of word structure – morphology. Complex words go through grammatical processes, changing, for instance, when used in the plural form.

“Maltese is found to be fascinating because these studies show that it not only makes use of Semitic rules, but also borrows from other languages such as Italian – for example, verb + r = the noun monitarjar. Originally, Maltese mor­phology was Semitic, but what we are seeing here is now Romance,” says Fabri.

Another project conducted by the Institute of Linguistics involves putting people in a language laboratory. Sophisticated software is then used to see how fast they register and use these words.

“These studies give us a better understanding of Maltese from a psychological and linguistic point of view. To what extent are certain aspects of Maltese still Semitic? How do words interact? Thus we have a mixed system that combines rules from different places into one system,” explains Fabri.

Gatt says that once the experimental and corpus data is recorded and analysed, the outcomes will inform the design of software systems that can perform an automatic analysis of word structure. However, this experimental work is also being coupled with research into the automatic acquisition of Maltese word formation rules using machine learning techniques.

Fabri and Gatt explain that HLT is a hugely growing area with lots of commercial applications. First, the research sector of HLT has to be developed in order to creatively contribute in this area. This, however, cannot be done without qualified individuals who can work in this field.

In fact, the demand for more qualified people to work in research and development of HLT systems is increasing in this sector both by IT companies and academia.

In Malta, IT companies are already in the process of creating HLT software. A case in point is Crimsonwing – an Malta-based IT company that is creating speech synthesis software that converts Maltese language text into speech. This software is mainly aimed for people who are visually impaired.

To cater for this demand, the Institute of Linguistics and the Department of Intelligent Computer Systems are jointly offering a B.Sc. (Hons) in Human Language Technology, starting in October.

Students will receive background knowledge in computing and linguistics. Once students graduate, they will have a solid grounding on how these two subjects come together, putting them in a better position to find employment in a wide range of areas.

The MLRS corpus is available for free at http://mlrs.research.um.edu.mt .

Sign up to our free newsletters

Get the best updates straight to your inbox:
Please select at least one mailing list.

You can unsubscribe at any time by clicking the link in the footer of our emails. We use Mailchimp as our marketing platform. By subscribing, you acknowledge that your information will be transferred to Mailchimp for processing.