Maltese in a digital age
A recent front page article in this newspaper suggested that Maltese, together with a number of other “small” European languages, risks being left out in the cold in the digital age (Maltese At Risk Of Digital Extinction, October 1). The immediate motivation for the article was a report published under the auspices of Metanet, a Europe-wide network of research centres involved in the development of language technology and resources, of which the University of Malta’s Department of Intelligent Computer Systems and Institute of Linguistics form part.
The report adopted the term digital extinction to describe the risk faced by languages which do not have adequate support in various areas of language technology.
The term has a satisfyingly ominous ring to it, one that was no doubt designed for the pages of the popular press.
Nevertheless, the point made by the report is well-taken. Broadly speaking, it is this: while some languages – notably English – appear to have a comfortable existence in the digital/computational world, as indicated both by their frequency of use in the electronic media and by the development of intelligent, language-sensitive technology for these languages, others like Maltese are far less well represented and are therefore a cause for concern.
There are two important principles that implicitly underlie this report.
The first is that multilinguality should be safeguarded as an outward manifestation of cultural and social diversity, with technology functioning as a bridge to effective communication.
The second is that all languages should be equal, that is, all speakers should be able to avail themselves of technology to facilitate communication, no matter how small the linguistic community they hail from.
The publication of the report is timely not only because it highlights a lacuna, but because it comes at a time when language is being recognised as one of the next frontiers in the development of intelligent systems.
Despite its dire verdict on Maltese, the report also comes at a time when, as I shall argue below, ongoing developments in local academia and industry are seeking to redress the balance.
Perhaps it is useful at the outset to distinguish the concept of “digital extinction” from the more familiar one of “language extinction” or “language death”.
The latter refers to the complete disappearance of a language, often because its community of users has dwindled and died out. This report is emphatically not claiming any such risk for Maltese. Its focus is on how well Maltese is being supported in the digital sphere.
A moment’s reflection should satisfy the reader that Maltese is alive and well.
While its community of speakers is comparatively small, a host of political and social forces are at work in allowing it to flourish, not least its status as an official language, the existence of a National Council for the Maltese Language whose brief extends well beyond that of a spelling watchdog, its use in the media and its written tradition, such as it is. What of the status of Maltese in the digital sphere?
Let us begin at the bottom, that is, at the most basic level at which the use of Maltese in electronic media would require support. Here, the technologies required are relatively well understood and where they are still lacking, adequate support would enhance their speedy development.
For example, the universal adoption of the Unicode standard has put paid to the notion that writing in Maltese should be avoided because it requires the use of special fonts that are not reliably transferrable across devices. Another setback for Maltese is the inexistence of a widely adopted spell-checker.
However, several initiatives have been undertaken in the recent past, starting with an open-source initiative in the local Linux community some years ago, and continuing with research currently being carried out within the University under the auspices of Metanet and the Maltese Language Resource Server project (MLRS; http://mlrs.research.um.edu.mt ).
While the latter is currently at a prototype stage, the fact remains that the resources exist and are actively being developed, at a pace which is as usual determined by the human resources available.
Bootstrapping from the “bottom level” to truly intelligent systems that perform language-related tasks automatically is more challenging.
Consider, for example, what is involved in the development of a software program like Apple’s Siri or its counterpart on the new generation of Android mobile platforms.
These apps enable users to use speech to give instructions to their mobile device, whether they need to text somebody or google a recipe.
While far from perfect, such apps represent the state of the ~art in user-oriented language technology.
Furthermore, they are multilingual: the same underlying techniques are used whether one wishes to communicate with a device in Mexican Spanish, Italian or British English.
This kind of multilingual orientation also underlies developments in Machine Translation which, despite its many flaws, is now at a stage where it can be used on a daily basis by online users.
What is required to develop such systems for Maltese is really no different from what is required in other languages.
Two things in particular are necessary. The first is data, lots of it, in the form of large corpora of language and speech.
These function as sources of “training data”, where computational methods are deployed to learn language structures and patterns automatically for use in other applications.
The aforementioned MLRS project has taken a first step in this direction, by developing a publicly available corpus of Maltese (approximately 100 million words).
The second core element involves a large and diverse family of ‘invisible’ technologies which are crucial components of many linguistically sensitive applications.
Many such systems require electronic dictionaries, for example, or need to recognise the linguistic structure in the stream of sound we call speech, or indeed concatenate simple speech sounds to produce fluent spoken discourse from written text.
In some applications, being able to label words (including words previously unknown to the system) automatically with their grammatical information is a crucial step towards properly analysing a text for its structure or content.
A look at recent developments suggests that progress is being made in many of these areas, though much remains to be done.
For example, CrimsonWing, a locally-based company, has launched a text-to-speech synthesiser for Maltese as a result of a Government-led initiative (the synthesiser is also available as an app on mobile devices).
Similarly, MLRS and Metanet have been working on the automatic labelling of grammatical information in text and a version of the existing corpus with grammatical labelling will become available in the near future.
As for electronic lexical resources, current work is focusing on the automatic analysis of words into their component parts (or “morphemes”) and the automatic identification of words which are related by virtue of their meaning or structure.
These efforts are complemented by an initiative recently undertaken by the University, in collaboration with a local publisher, to produce a large-scale electronic dictionary.
In short, the digital extinction of Maltese is being addressed by ongoing developments both within academia and industry. Rather than an expression of mindless optimism, this is simply a statement of fact.
Members of the local research community are painfully aware of the slow pace of progress, but are well placed to give a realistic assessment of where we are.
Two things will determine whether current efforts will come to fruition in time. One is the nurturing of a generation of researchers and engineers who have sufficient knowledge of both the mechanics of language and of computer science.
The recent development of a new B.Sc. in Human Language Technology at the University of Malta is a decisive step in this direction.
A second factor is the availability of research funding and infrastructure to support this work.
One potential way forward is to consolidate existing efforts and bring them together as far as possible, in order to pool financial and human resources and build on the foundations that are already in place.
In this respect, the Metanet report, like the Rector’s recent call for adequate funding to bring the University’s dreams to fruition, acts as a timely call for action.
The author is director of the Institute of Linguistics, University of Malta.