Disputes about spelling in Maltese are nothing new, in particular when it comes to the assimilation of loanwords from other languages into our own. As a recent children’s publication made us question, is it ‘ħendbeg’ or simply ‘handbag’?

As the issue continues to be debated everywhere from academic conferences to Facebook groups, the one thing we cannot dispute is that, in one form or another, Maltese is still very much alive.

Certain factors have however obstructed the use of Maltese in the digital world. Setting up one’s computer or smartphone to allow the input of Maltese characters often requires more effort than users care to invest. But the problem runs deeper than simply replacing Maltese characters with their diacritic-less equivalents.

Maltese has a very rich morphology, and problems with orthography are very common, even to people who use the language every day. How do you spell the plural of ‘żagħżugħ’ (youth), for instance? Unless you’ve committed it to memory, you either guess or use a similar word like ‘karkur’ as a model (it’s ‘żgħażagħ’, by the way). Isn’t there somewhere that you can automatically check this?

Incredibly, even today there is no comprehensive online dictionary for Maltese, much less an automatic spell checker. In fact, digital linguistics resources for Maltese are very much lacking when compared to those of other languages, as highlighted by a recent white paper from the META-NET project entitled The Maltese Language in the Digital Age.

While the Maltese language enjoys some very high quality printed dictionaries and thesauri, no online versions of them exist. The digitisation of these paper resources has been mired with intellectual property issues, and the work required to compile a new online dictionary from scratch is substantial.

But all hope is not lost. Various recent works of research in the field of Maltese linguistics have produced a number of smaller lexical resources, each compiled in order to study a particular feature or phenomenon of the language. To date there have already been exhaustive compilations of root-and-pattern verbs, broken plurals and deverbal nouns in Maltese. These resources are already available online via the Maltese Language Resource Server (MLRS). Additionally, a basic English-Maltese online dictionary originally compiled in 1997 has recently been updated and re-released in a machine-readable format.

While all these resources are openly available online, having this information stored in different locations and file formats is of limited use when one wants to look something up. A recent master’s thesis in the area of computational linguistics has begun to combine all these disparate lexical resources into a single, searchable online database. The result of this work is a collection of automatically extracted and manually curated Maltese word lists, affectionately nicknamed Ġabra (collection).

This resource is also hosted by the MLRS, and is already available at http://mlrs.research.um.edu.mt/resources/gabra . Searching the collection’s 11,000 entries is fast and straightforward, as you would expect from a digital database. Apart from standard feature information such as gender and root, the collection also includes English glosses for many of the entries.

But Ġabra is more than just the online equivalent of a traditional print dictionary. Being able to search for the verb ‘kiteb’ (write) is nice, but a single verb in Maltese has close to 1,000 inflectional forms which vary according to aspect, person and the inclusion of suffixed pronouns for direct and indirect objects. And if you’re hoping to find any of these forms in a dictionary, you’re out of luck.

Conjugating verbs is repetitive and boring, and including all inflection forms is impossible to do in a paper dictionary because of size considerations. But these are not concerns in a digital medium.

As verb inflection is a highly regular process, it makes perfect sense to encode the morphological rules of a language as a computer program and have a machine generate and store all these forms for us automatically.

Using grammatical framework software, the linguistic rules of Maltese have been represented as a computational grammar, which allows us to programmatically produce thousands of verb forms from a single base form. So by using just the input form ‘kiteb’, we can automatically generate ‘ktibnihom’, ‘jiktbuhilniex’, and the 950 other forms of this verb.

A single verb in Maltese has close to 1,000 inflectional forms

By combining this generative grammar with the word lists from the resources previously collected together, we can generate hundreds of thousands of Maltese word forms and begin to approach a comprehensive full-form lexicon for the language.

Users of Ġabra may soon find that some entries have information which is missing. In addition, since many of the word forms in the lexicon are automatically generated, they may contain errors which have not yet been caught manually. But errors such as these are usually systematic, so fixing a single bug in the underlying grammar will mean that hundreds of incorrect inflectional forms can be fixed at once.

Ġabra is not presented as a complete lexicon, but rather as a solid foundation for collecting Maltese lexical resources in an open and easily searchable format. With the support of the Department of Linguistics at the University of Malta, the database is continuously being improved upon so that it may continue to develop into a mature, comprehensive full-form Maltese dictionary for the digital age.

John Camilleri is a PhD student in language technology and formal methods at Chalmers University of Technology in Gothenburg, Sweden. He is also active in the field of computational linguistics and is particularly interested in digital lexical resources for Maltese.

This work was carried out as an MSc thesis in computer science at Chalmers University of Technology in Gothenburg, Sweden.

The degree was supported by a scholarship from the Strategic Educational Pathways Scholarships (STEPS), which is part-financed by the European Union, European Social Fund under Operational Programme II – Cohesion Policy 2007–2013, Empowering People for More Jobs and a Better Quality of Life.

Sign up to our free newsletters

Get the best updates straight to your inbox:
Please select at least one mailing list.

You can unsubscribe at any time by clicking the link in the footer of our emails. We use Mailchimp as our marketing platform. By subscribing, you acknowledge that your information will be transferred to Mailchimp for processing.