The meaning inside of words

The segmentation of the word naqsamhielek, ‘I will break it for you’, into smaller parts so as to extract their individual grammatical meaning, which could then be used in applications such as spelling and grammar correction.

The segmentation of the word naqsamhielek, ‘I will break it for you’, into smaller parts so as to extract their individual grammatical meaning, which could then be used in applications such as spelling and grammar correction.

We often hear the idiom ‘read between the lines’, meaning to understand something that is not explicitly written. The nature of language does not stop there. Words also have their own hidden information tht is not explicitly stated and yet it seems as second nature to us. Words are laden with grammatical meaning. Take, for instance, particular word endings like ‘–s’ in tables; this indicates the plural form of the word table. As humans, we learn this in our childhood as we acquire language. There are several theories as to what exactly the brain is learning and how children are able to produce the plural form of words they’ve never heard before.

In the field of natural language processing, our focus shifts onto computers and machines, and how computer software can be programmed to understand and process human language. This is particularly challenging since there are many layers of knowledge in language. One particular layer is called computational morphology. This deals specifically with how words are formed and tries to identify the grammatical or semantic meaning of each part of a word.

Research carried out at the Institute of Linguistics and Language Technology at the University of Malta specifically looked at computational techniques that could automatically identify these word segments and associate a grammatical meaning to certain parts of words. The added challenge was that this was done for Maltese – a language with two systems for word formation. This is highly evident in verb formations, which can follow what is called a root-and-pattern system (ksirt – I broke, from the root K-S-R, kiser, ‘to break’) or a stem-and-affix system (spjegajt, ‘I explained’, from the stem spjega, ‘to explain’).

One of the main aims of this research was to look at the different techniques employed on other languages and analyse their performance on Maltese.

A number of techniques were used to automatically group related words together (ksirt with kiser, spjegajt with spjega), and to provide labels to the words indicating the grammatical information that the word formations provide (e.g. ksirt – first person singular perfective vs. ksirna – first person plural). The initial results are promising, and continued work in this area is necessary. Eventually, this type of information will aid tools such as spelling and grammar correction for Maltese – something which is high in demand and beneficial for the development of the Maltese language in the digital age.

The research work disclosed in this article was partially funded by the Malta Government Scholarship Scheme grant.

Did you know?

• There are more than 7,000 languages spoken around the world today.

• Although English is the second most spoken language in the world (after Mandarin Chinese), it is estimated that 75 per cent of the world’s population does not speak English.

• Languages constantly borrow from each other. Around 30 per cent of the words in English were adopted from French – for example, almost all words related to ballet are from French.

• Language is thought to have originated around 100,000 BC. around the time when the modern human evolved with a modern skull, brain size and vocal chords.

• ‘Dord’ (meaning ‘density’) was discovered in the Merriam-Webster Dictionary in 1939. It is not a word, but had been there for five years due to an error.

• While the Pope tweets in nine languages, the ATMs in the Vatican City use Latin as the first language. You would have to figure out how to change the language setting to use the ATM, unless you know Latin of course.

For more trivia see:

Sound bites

• Language development: A new study of Spanish-English bilingual children by researchers at Florida Atlantic University published in the journal Developmental Science finds that when children learn any two languages from birth, each language proceeds on its own independent course at a rate that reflects the quality of the children’s exposure to each language. Spanish skills become vulnerable as children’s English skills develop, but English is not vulnerable to being taken over by Spanish. Turns out it’s not the quantity of what the children hear, it’s the quality of their language exposure that matters.

• Speech difficulties: A recent study published in the Journal of Child Psychology and Psychiatry found that speech difficulties are linked with difficulties in learning to read when children first start school, but these effects are no longer apparent at eight years of age. Researchers confirmed that early language impairment that co-occurs with speech difficulties predicts poor literacy skills at both five-and-a-half and eight years of age. Having a family history of dyslexia had a small but significant effect on literacy at both ages, above and beyond the effects of speech and language.

• For more soundbites listen to Radio Mocha on Radju Malta 2 every Monday at 1pm and Friday at 6pm


See our Comments Policy Comments are submitted under the express understanding and condition that the editor may, and is authorised to, disclose any/all of the above personal information to any person or entity requesting the information for the purposes of legal action on grounds that such person or entity is aggrieved by any comment so submitted. Please allow some time for your comment to be moderated.

Comments not loading? We recommend using Google Chrome or Mozilla Firefox with javascript turned on.
Comments powered by Disqus