Meetei Mayek and Unicode

You are here: Home » MeiteiMayek »

E-Pao! Opinion - Meetei Mayek and Unicode

Meetei Mayek and Unicode
- Making the Script Immortal -

By: Tabish Qureshi *

Meetei Mayek, the indigenous script of Manipur, has a history which makes an interesting subject both for a linguist and a historian. The destruction of the script by burning all the documents in the early 18th century and its resurrection, almost from the ashes, hundreds of years later, reminds one of the mythical bird phoenix which is supposedly consumed by fire, and then rises again from its ashes.

Meetei Mayek is back, and has been given its due place. However, one might surmise if the script is hundred percent safe today. What if some whimsical tyrant comes to power and decides to destroy the script again? Well, I will argue that in this era of new technology and digital information, it is practically impossible to carry out such a destruction.

Computers store everything as numbers - whether you write a story or type in a long proof of a mathematical theorem, the computer stores everything as just numbers. So, a research paper by Einstein or the epic of Khamba-Thoibi, are all stored as a sequence of numbers in a "file" in a computer. The computers in turn store these numbers on some storage devices like hard-disks, CDs, DVDs, floppies and now pendrives.

DVDs and pendrives are portable media, each of which can store several gigabytes of data. To give you an idea of how much space a fat book occupies, the whole Bible occupies only 4-5 megabytes (MB) of data. And one gigabyte is about a thousand MB! So, one can imagine that a tiny pendrive or a DVD can store thousands of books the size of the Bible.

Now, imagine if in the time of king Pamheiba, the technology were as advanced as today. People could have stored all the puyas that were ever written, on a single DVD or on a tiny pendrive, the size of a fountain pen cap! People could compress files of a puya and email it to some account in some other country.

The historical and holy text that was mindlessly destroyed by burning the puyas, could easily have been sent to "safe" places, where Pamheiba's rule did not hold In a matter of seconds, a thousand copies of a puya could be cloned. In such a situation, it would be humanly impossible for someone to erase every trace of a manuscript.

This would have been the scenario, had this technology been around at that time. Sadly, it did not exist then. However, this technology is around now and it is up to us to make the best use of it, in spreading the script, using the script, and popularizing it.

To popularize the script we don't need any library burning fanatics - we only need the internet, computers, and the power of digital media. We don't even need as many teachers, the online tutor programs can easily teach Meetei Mayek to anyone interested. One only has to be interested

The hotch potch of character maps

Now-a-days one can use Meetei Mayek on computers after installing appropriate software. So, have we done everything that needs to be done? The answer is no, and I will explain that in a moment. Those who visit E-pao site regularly must have frequently seen lines like the following:

Meitei Mayek Poem :: fHgd_ra samNyada

A newcomer wonders what it is, but a computer geek informs him that he doesn't have the right font installed on his computer. And if that it done, these strange words will actually be displayed as Meetei Mayek. Why does this happen? This happens for the following reason.

Computers store alphabets as numbers - so, there are certain numbers associated with the Roman letters A, B, C, D etc. The Meetei Mayek font provided by E-Pao maps the Meetei Mayek characters to the same numbers that are used by Roman letters. So, a newcomer who doesn't know that this text is to be viewed with E-Pao's font, sees some gibberish text.

The problem is made worse by the fact that there is not just one Meetei Mayek font, there are several, and each has its own mapping of characters. So, a document written in Meetei Mayek using one particular font, will appear as gibberish if seen using some other Meetei Mayek font.

So, as things stand now, if a document is typed in Meetei Mayek and sent by email to somebody else, he/she cannot make any sense out of it. So, the gigabytes of stored documents may just appear to be garbage. What one person writes cannot be easily understood by another - this is not a healthy scenario in this era of information technology.

The Unicode standard

So, is there a solution to this mess? Yes, the answer is Unicode (www.Unicode.org) which is becoming the de-facto standard for storing text in computers. Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.

Once the Unicode standard is adopted, each character will be unique, it will not be replaced by mistake by any other character. A document typed using Unicode standard, will be understandable by everybody, without any ambiguity. Unicode aims to assign a unique number to every characater of every script that exists in the world.

Even the scripts which have died out, and are no longer in use, will be included in the Unicode standard. This will make storing of old documents in old scripts possible.

The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystem, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others. It is supported in all modern computer operating systems like Linux, Windows XP and MacOSX.

Most scripts of India, like Devanagari, Tamil, Malayalam, Urdu, Bengali, have already been included in Unicode. Now comes the million dollar question - has Meetei Mayek been included in the Unicode standard? The answer is, not yet.

However, the process is under way. The task of preparing a draft for assigning the numbers to Meetei Mayek characters is being coordinated by Michael Everson of Ireland, who is an expert in the writing systems of the world, in consultation with learned persons, and scholars who have done research on the script, both Manipuris and non-Manipuris.

The draft proposal for encoding of Meetei Mayek has been finalized and sent to the review board of Script Encoding Initiative (SEI). The SEI, established in the UC Berkeley Department of Linguistics, is a project devoted to the preparation of formal proposals for the encoding of scripts not yet currently supported in Unicode. So, everything is in the pipeline now, and it is only a matter of time before Meetei Mayek will become part of the Unicode standard.

One might wonder what difference it will make to a layperson when that happens. Once it happens, the computer you buy, which comes preloaded with Windows or Linux, will already have Unicode compatible fonts, capable of displaying Meetei Mayek characters, installed.

You will be easily able to type documents, send emails and chat in Meetei Mayek. And the documents you type and the websites you make in Meetei Mayek, will not appear as gibberish to others - they will appear faithfully as Meetei Mayek.

Which form of Meetei Mayek?

We all know that the controversy regarding which form of Meetei Mayek should be used, has not died out completely. The two camps have their own reasons to promote 27- or the 35-letter form.

One might feel apprehensive about which form of the script Unicode is going to adopt, and whether that will be the right thing to do. Well, the experts who discussed this aspect came to the conclusion that it will be best to include all the characters from all the forms of the script, in Unicode.

This will leave room for using any form of Meetei Mayek in the future. For example, if the currently adopted 27-letter form continues in the future, it will be fully Unicode compatible, and yet if there exist some old manuscripts, which were written in the 35-letter form, they can also be typed and stored using the Unicode standard, by using the extra characters encoded in Unicode. Encoding certain characters in Unicode doesn't mean that people have to use them - they can use those characters if they want to.

Figure 1: A schematic representation of the codespace. The code blocks for a few scripts have been indicated.

A wee bit of techincal stuff

Now that we have come this far in the discussion, we might as well take a peek at what this coding looks like.

The character codes are basically sequences of four hexadecimal numbers, prefixed with a "U+". Hexadecimal numbers are 16 in number (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F). Code elements are grouped logically throughout the range of code points, called the codespace.

Figure 2: The proposed chart of Unicode characters for Meetei Mayek.

The coding starts at U+0000 with the standard ASCII characters, and continues with Greek, Cyrillic, Hebrew, Arabic, Indic and other scripts; then followed by symbols and punctuation. Figure 1 shows a schematic representation of the codespace.

In the codespace chart in figure 1, the fourth number of the code-sequences has been suppressed. For example, the single upper left blue box represents numbers U+0000 to U+000F. One can see blocks of numbers representing various scripts. The numbers U+1C80 to U+1CCF have been reserved for Meetei Mayek.

The chart of Meetei Mayek characters proposed for Unicode is displayed in figure 2. It displays three numbers of the sequence as the column number, and the fourth as the row number. Thus, for example, "kok" will be assigned the code U+1C85 and "til" will be assigned the code U+1C94.

At this point the reader might have the feeling that this Unicode thing looks too complicated and how will simple users cope with it. The answer to that is that all this coding remains in the background and the user doesn't have to bother with it.

For example, for typing "kok" the user wouldn't have to use some complicated method of typing the code U+1C85. The software will take care of how the user types.

For example, while using a word-processor, one may just have to select the script "Meetei Mayek" from a menu, and then just press "k" to type a "kok". The word-processor will take care of translating the "k" to the Unicode character code corresponding to "kok".

* Tabish Qureshi contributes regularly to e-pao.net . The author can be reached by email at: tabish(AT)jamia-physics.net . This article was webcasted on January 06th, 2007.

* Comments posted by users in this discussion thread and other parts of this site are opinions of the individuals posting them (whose user ID is displayed alongside) and not the views of e-pao.net. We strongly recommend that users exercise responsibility, sensitivity and caution over language while writing your opinions which will be seen and read by other users. Please read a complete Guideline on using comments on this website.