How to enter Vietnamese in Blogger (or anywhere else)
This is an adaptation of an essay I wrote on September the 4th, 2004. The original can be found on the original "Down and Out of Sài Gòn" blog. The pictures have been changed; I've also tried to update content as relevant.
Much of my writing contain Vietnamese characters - homework, blog posts, and whatnot. How hard is it to do this? It's easy.
Firstly, let's get rid of one common misconception1 about the Vietnamese language: people need special, proprietary fonts to display it on computers (such as the ubiquitous "VNI Times" font manufactured by the VNI Corporation2), as opposed to the preinstalled fonts that come with operating systems (such as Times New Roman on Windows). This misconception may have been true in the mid 90s, but it isn't true in the 21st century. Windows fonts such Times New Roman, Arial and Courier New (among others3) have supported Vietnamese since Windows 98 onwards. Ubuntu releases also have fonts with Vietnamese characters - be it Edgy Eft or Hardy Heron or Xany Xenomorph. I have no direct experience of MacOS, but Alan Wood has listed quite a few Unicode fonts for that platform. We're interested in those that support "Latin Extended Additional", as that's the range of characters provided mainly for Vietnamese support.
Still, entering Vietnamese is not as easy as entering English, where pretty much all the characters you want are on the keyboard. It is a little harder than entering German (with umlaut characters such as ü) or French (with é and è), and other European languages. But it becomes easy with practice. All you have to understand is the diacritics that this weird and wonderful language uses in much abundance.
Many languages use diacritics - additional marks above or below letters. In the last paragraph, we saw the umlaut (two dots above the letter), the acute accent (a raising slash above the letter) and a grave accent (a falling slash below the letter). Diacritics are common with vowels in Latin alphabets, and are sometimes used with consonants as well.
English stands at one extreme: it hardly uses diacritics at all, except with the odd loan word. Vietnamese is at the other extreme - many letters use two diacritics at once. For example, the most common family name in Việt Nam is "Nguyễn". This makes it hard for foreigners to read and hard for many to remember. But there is method in all this madness. Each diacritic you encounter has only one purpose:
- One set indicates the type of the vowel: beet versus bet, cart versus cut, and so on.
- Another set indicates the tone: is it raising, falling, dipping or flat? Vietnamese is a tonal language. The writing style (or Quốc Ngữ) has tone built into it.
Let's look at the vowels first. Firstly, there are six vowels without diacritics; "a", "e", "i", "o" and "u" should be familiar to you. In Vietnamese, "y" is also always used as a vowel. Then there are 3 other vowels which use the circumflex "^" diacritic: "â", "ê" and "ô". Another diacritic used is the breve (which looks like a cup), and there's only one vowel associated with it: "ă". Finally, there are two letters which have a pseudo-diacritic hook or "'": "ư" and "ơ". Note that the presence or absence of the hook, the breve and the circumflex says nothing about the tone of the letter. However, they are pronounced differently in the language, and are considered separate letters. For example, Vietnamese dictionaries are categorised by "A", "Â", "Ă", "B", "C" and so on.
Just as important are the tone markers, or "dấu". I will provide the Vietnamese names, as they provide their own examples. You have these 5 tones to remember:
- The acute accent, known in the tongue as "dấu Sắc". This indicates a rising tone.
- The grave accent or "dấu Huyền". This indicates a shallow drop.
- The dot below or "dấu Nặng". This is a deep, low drop - the Marianas Trench of tones.
- Then you have the "question mark" tone, or "dấu Hỏi". Think of a low dip and then a rise.
- Penultimately, you have the tilde or "dấu Ngã". This is similar to the "dấu Hỏi" except that you make it creaky and tighten in larynx - well, that is if you live in Hà Nội. In Sài Gòn, the Hỏi and the Ngã sound pretty similar. However, you should distinguish them in your writing.
The final tone is its absence: "không dấu" or no tone at all. Here, you keep the vowel flat, and by that I mean flat: no dipping or rising to intonate your emotions! All vowels absent the 5 tonal markers are assumed to be flat in tone. Other diacritics may be present, or not. For example, "u" and "ư" are pronounced flat (but are different vowels), and "ú" and "ứ" are pronounced with a rising tone (but are also different vowels).
We must finally finish by mentioning there is an extra consonant in Vietnamese: "Đ"/"đ", which is different from the "D"/"d" that English speakers know. They are considered different letters in Vietnamese, and have different sounds4.
All of this may seem daunting for the Vietnamese beginner. The total range of vowels is 2 (lower case and upper case) by 6 (for the six tones) by 12 (for the 12 vowels in the language) = 144 possible vowels. Then you've got Đ and đ. How do you enter all these characters? There are two methods, as we shall see.
Firstly, there's the character map method. That's basically a program that shows you all the character for a given font inside a table. One example is the Character Map (charmap.exe) program inside Windows. Microsoft Office also provides a similar utility from the "Insert Symbol" menu command. The idea is that you click, copy and paste the characters you want to your given program. Here's a screenshot of Character Map in action:
You can use this if you want to display the odd Vietnamese character inside your file. I advise against it in the long run: it's tedious. After 10 point and clicks, you will get tired of the whole activity.
I recommend that you use a Vietnamese keyboard
or keyboard driver for the task. Despite their
name, they are not hardware: they are small programs that sit in your
OS and convert your keystrokes into nice, lovely Vietnamese. And do I
have a particular program in mind? I do: Unikey. I've
used it for about
and a half without complaint. I like it so
much that I've shut off rival keyboard drivers running on the same
The advantages of it are:
- It's free.
- It's just a download away: for NT/2000/XP, for 95/98/ME6 or for Linux.
- Installation is simple: just unzip it and it is ready to go.
- It lacks bloat. It's a small program that does what it is does without any unnecessary feature.
- It sits on the taskbar. This makes it easy to switch between "English" mode and "Vietnamese" mode: just click on the icon on the taskbar.
- The user interface actually provides for English speakers, which makes it easier to understand.
(Of course, if you aren't happy with Unikey, you could look for other utilities. Look at the Vietnamese Unicode FAQs for more information. But rather than comparing all the utilities, I want one that works for me.)
When you start up Unikey, you see the following dialog:
What does it all mean? Fortunately, you can find out what is happening by clicking on the "Mở rộng" button. "Mở rộng" means expand, and that's what you need to do.
See the checkbox with "Vietnamese interface"? Uncheck it. The whole interface will turn into English:
That makes it a lot easier to use, doesn't it? Okay, here's what I recommend you do:
- I recommend you always set the "Character Set" to Unicode - always. A character set is basically how characters like "ư" and "a" are represented as numbers that computers can handle. Microsoft Office utilities and most web apps like Blogger are set to handle Unicode by default. Unicode is an international standard, so you can't go much wrong with it. The only exception to this is if you have the misfortune to use one of the old VNI Fonts from years ago. But Unicode is a good thing.
- The "Input method" is what keystrokes will form a character like "ư". I prefer VNI7, but I will give instructions for using Unikey with VNI and VIQR as well. See the next section for instructions.
- Advanced options: uncheck them all. Especially uncheck the "Use oa', uy' (instead of o'a, u'y)". This is an irritating preset that doesn't allow you to write "hoà"; instead it always comes out as "hòa". In practice, "hoà" is closer to the vernacular Vietnamese I encountered when I lived in VN.
- There's also the "Help" button - which provides you "Help" in Vietnamese. If you understand Vietnamese, it's nice to look at. If you don't, it's not of much assistance. Anyway, that's what this document is here for, isn't it?
- I should mention the "Auto-run UniKey at boot time" checkbox. Only check it if you want the program to start up every time you reboot the PC... and the PC belongs to you. It is impolite to install apps on other people's computers without their pemission.
Now you can click on "Close". The program will now sit on the taskbar - unobtrusive, yet available. If you see a big "V":
That means that it is set up to enter Vietnamese. But if you want to enter English text without diacritics getting in the way, just click on the "V" and you will see:
It's easy to toggle from one to another: left-click on the letter. And if you want to remove the program altogether: right-click on the icon in the taskbar, and choose "Kết Thúc" from the context menu.
Now that you have it running, you can use the input methods to enter Vietnamese into your chosen application.
The idea of a keyboard driver is that it makes it easy to enter desired characters using the keyboard you have. UniKey doesn't even assume you have the "Alt" or "Ctrl" buttons. Instead, you press a combination of letters that tend to follow the following order:
- If you want characters without diacritics, like "a", "b", or "c", then type them.
- If you want characters with diacritics but no tone markers, then type the combination. For example "dd" in TELEX will create a "đ", and "ow" will create a "ơ".
- Diacritical keystroked are typed after the character they affect. For example, typing "a6" will produce "â" in VNI mode, but typing "6a" will get you "6a".
- Some characters are altered by two diacritics, such as the ệ in "Việt Nam". In cases like this, it doesn't matter which diacritic you add first - the software will automatically compose the character for your. In VNI, "e56" and "e56" will get you the same thing: "ệ".
The following table gives the combinations for all the Vietnamese characters in lower case. If you want upper case, then use upper case letters instead. For example, "DD" in TELEX will create "Đ", and so on. Here are the tables:
|The main keystrokes for VNI, TELEX and VIQR.|
|â||Type "aa"||Type "a6"||Type "a^"|
|ă||Type "aw"||Type "a8"||Type "a("|
|đ||Type "dd"||Type "d9"||Type "dd"|
|ê||Type "ee"||Type "e6"||Type "e^"|
|ô||Type "oo"||Type "o6"||Type "o^"|
|ơ||Type "ow"||Type "o7"||Type "o+"|
|ư||Type "w" or "uw"||Type "u7"||Type "u+"|
|Add a "dấu Sắc"||Type a "s"||Type "1"||Type single quote "'"|
|Add a "dấu Huyền"||Type a "f"||Type "2"||Type reverse quote "`"|
|Add a "dấu Hỏi"||Type a "r"||Type "3"||Type "?"|
|Add a "dấu Ngã"||Type a "x"||Type "4"||Type tilde "~"|
|Add a "dấu Nặng"||Type a "j"||Type "5"||Type period "."|
|Remove tone||Type a "z"||Type "0"||Type "0"|
To understand this, I will provide some examples:
|Examples for VNI, TELEX and VIQR.|
|Hai Bà Trưng||Type "Hai Baf Trwng"||Type "Hai Ba1 Tru7ng"||Type "Hai Ba` Tru+ng"|
|Tiếng Việt||Type "Tieesng Vieejt"||Type "Tie61ng Vie65t"||Type "Tie^'ng Vie^.t"|
|ĐƯỜNG||Type "DDWOWFNG"||Type "D9U7O72NG"||Type "DDU+O+`NG"|
Yes, it all seems a little tedious to learn. So choose one of the methods, and practice. I admit you may need a good motivation to do this. My motivations were (a) learning Vietnamese, and (b) retyping the names of Vietnamese students that had been provided to me without diacritics.
What I've tried to do here is set up a tutorial for those unfamiliar with Vietnamese, and also unfamiliar with computers. A lot of this was learnt from consulting the original Vietnamese documentation, and also a lot of practice. Now if you are interested, practice as well. You may still encounter difficulties. For example:
- You are trying to enter Vietnamese in a font that does not have Vietnamese characters. For example, fonts like "Georgia" and "Garamond" do not support them. That's a shame.
- You are trying to enter Vietnamese in a pre-Unicode "Vietnamese" font like VNI-Times. The result looks like poo. One way around it to set the "character set" to "VNI". However, I'd recommend against it, unless (a) you are printing it, or (b) you know the people you are sending the document to also have a VNI-font installed.
- There's one problem that I've had with Excel. You enter a Vietnamese word in a cell. You try to enter another word in another cell. Then the "Auto-complete" feature tries to guess what you are entering, and make a mess of it. This has happened to me a few times. I suggest you turn "Auto-complete" off.
- Finally, the program you are using doesn't support Unicode at all, and cannot even understand what you are typing. For example, the main interface for the popular editor HTML-Kit cannot handle it8.
But if you have a reason to learn Vietnamese, and if you are determined: go for it. I wish you the joy of discovery!
All mistakes in this document are mine.
- This was a common perception; I think things have changed in the last couple of years. One reason is that the Vietnamese part of the web has exploded. Like elsewhere, they're confronted with the problem that their sites have to support "common demoninator" fonts Verdana and Arial and whatnot.
- Proprietary but pirated fonts. These days, VNI Software are pushing Unicode support, but they don't walk the talk. See their Vietnamese language page , which tells you "CD-ROM Learn English by Pictures giuùp baïn hoïc English vôùi phöông phaùp tröïc quang vaø tröïc aâm."
- I should add Verdana to this list, as it is the primary font used on this website.
- If you are interested in how they sound, see the Wikipedia Vietnamese phonology article. To summarize, Vietnamese "đ" sounds like an English "d". Vietnamese "d" sounds like "z" in North Việt Nam, and "y" in "yes" in the South.
- Alternatives like VietKey and VPS can be found at EVietnam Group. I cannot guarantee whether they work, or whether they have been pirated. Unikey makes no bones about being open source.
- Since this piece was written, Microsoft has withdrawn all support for the 95/98/ME versions of Windows. If you are using XP, I suggest you use the "NT" version of the software. I have no idea if the package works on Vista.
- When I originally wrote this essay, I preferred TELEX. My wife prefers VNI, and she shares the computer with me, so I've come around to her choice.
- In 2004, the latest freely available version of HTML-Kit was 292 - and that's the same situation five years later. This version of the product could not and cannot parse and recognize Unicode text. For me, that is a real deal-breaker. The latest version is 300, but I need to shell out $65 US to obtain it. There is no "try-before-I-buy" period, and documentation is scanty. I don't trust the product enough to spend money.