Multilingual solutions require some basic planning and some forethought but how to go about it? More important how far can you go until “it’s not worth it.”
This article examines some of the simplest principles in multilingual coding and examines some of the planning considerations.
There is no perfect, one size fits all solution. There are however good guiding principles to follow.
- Considerations of language
- Programming multilingual
- Return on Investment
[read more=”Read more” less=”Read less”]
Considerations of language
We start this journey into multilingual solutions in English because it’s my strongest language.
I have studied and can use Irish, German and a bit of French.
All of these are called Indo-European languages. The Romans conquered most of Europe so that language spread all over Europe.
The English, Spanish, Portugese and Dutch did a lot of exploring and so English and Spanish went around the world.
Indo-European languages have very similar basics. Whilst they may appear very different there are some big commonalities.
You have a lexicon (the words that make up the language) and a grammar (how the words are put together).
In general terms the words start at the top of the page. You read them from left to right and top to bottom. You’ve done this since you were a child.
Not the only game in town
I’m going to grossly over simplify the following as I know there are hundreds of dialects, but for the purposes of keeping the discussion going.
What is lovely to know is that Indo-European languages are not the only game in town. People were being creative for a long while before the Romans.
Hieroglyphics use pictures instead of letters. You may have heard of the Rosetta Stone. This stone had three languages which were carved into it. One of the earliest examples of multilingual solutions.
From that stone people were able to work out how to translate the other languages. No one could speak Hieroglyphic but because there was Demotic and Ancient Greek on the stone, scholars could work out what the glyphs meant… the language was cracked.
Asian based (Chinese, Japanese) languages take a whole other approach. Writing from right to left and bottom of the page to top!
If you’re being technical about it, we read numbers right to left, not left to right.
Glyphs and Symbols for multilingual solutions
So the basics of modern communication were done by people who spoke English. Specifically American’s led the way. Hence we had the invention of ASCII. The American Standard Code for Information Interchange. More details on that here.
The challenge with ASCII was that it worked for English but that’s it. Symbols for modern European languages weren’t considered at all… so the list of codes had to change to include other symbols.
Unicode Transformation Format or just Unicode for most programmers came about because of multilingual solutions.
UTF-8 extended ASCII, so ASCII didn’t have to change. However UTF-8 added a lot more onto the end.
Consequently UTF-8 has 1,112,064 characters compared to ASCIIs 128.
These references are called Code Pages and there are many of them. I’m going UTF-8 for these discussions.
Programming and standards idioms
An idiom means a group of words established by usage as having a meaning not deducible from those of the individual words (e.g. over the moon, see the light ).
Because the bulk of modern computer programming was developed in the US, most of the standards and way it works use American English.
British English spells it as colour where American English spells it as color. The HTML standard for web pages uses color.
Let’s go even simpler and look at programming
In American English, program is the correct spelling. Australian English sees program and programme both acceptable. While in British English, programme is the preferred spelling, although program is often used in computing contexts.
So to be a computer programmer writing a computer program, we are using American English to start from. So all the programming examples are in American English.
Microsoft are a global phenomenon. They make computer programming tools such as Microsoft Visual Studio. Programmers globally learn on this tool.
They hit the multilingual solutions wall in a big way and so had to overcome the challenges. Not just the words but also the interactions.
They developed LCIDs. Language Code IDentifiers and there is a full reference of which language gets which code.
Like languages, they are not the only game in town but to be fair, they have seen the most action in this particular area. So I’m going to use them as the reference point.
So what’s the point of these language files?
How to program in a multilingual way
Step 1 is what not to do
This is a trite pseducode example to highlight the approach, not a computer programming language specific implementation.
This is an example.
So this is hard coded in English. It isn’t going to change.
Next we could use if statements
if language is english This is an example. otherwise if language is irish is sampla é seo
This highlights why your programming needs to support UTF-8 or similar to handle the output of characters like the é
This also makes the code very messy… as for every sentence you’ll need thousands of if statements… so that doesn’t work
Step 2 is language files
For multilingual solutions instead, we put language into a function. The American English language is 1033 in Microsofts LCIDs.
So our code becomes
Translate ( 1033, 1 )
The first number refers to a Microsoft LCID and the second is which sentence.
So now we build files with the translations in them. In our English file.
- This is an example
- Hello World
- This was easy
The LCID for Gaelic (Irish) is 2108. In our Irish file we make
- Is sampla é seo
- Dia duit an Domhain
- Bhí sé seo éasca
Make as many language files as you like.
Now all you need in your programming is to change the first number in the function to the LCID and your application changes language appropriately.
Translate ( PreferredLanguage, 1 )
You build your program in English first. This gives you a language file to start with… then you can give that file out to translation services who don’t know how to program but can just translate all the sentences or words into the appropriate language.
If you can’t beat it go round it.
Instead of trying to make all the words, approaches and characters work for every language a more intuitive approach is to use icons. People recognise pictures no matter the language.
What do these icons mean to you without any language.
So it is possible to communicate without words, you as the designer need to think about how best to make this work.
For a system or website you’re building what might the following icons mean.
Return on Investment
So technically multilingual solutions can be done. There are loads of options here to start on.
The question is should you? The answer is a fiscal one.
Return on Investment is a cost / benefit analysis of what it would take to do this work.
- Change the code to use functions, takes X time at a cost of Y programmer hours.
- With the engine changed, you now have language files to translate at Z cost.
- There is also support, documentation, sales and customer care considerations.
So your maths becomes
- X multiplied by Y to start
- Translations multiplied by Z languages
- support and documentation changes
Now if the sales of your solution is greater than the cost of doing it… then do it.
If “cost of doing” less than “profit from sales” then make your solution multilingual.
Definitely consider iconography to reduce the amount of words in your interface.
Make sure you use something like UTF-8 from the outset to enable you to change in the future if you need to.
What makes sense is to use the language of the audience you have most appeal to and access to. New markets lie in languages you don’t speak.