UTF-8 charachter set

Not all people who are doing some business in web designing, website creating and programming in whole are sure why why they should better use UTF-8 encoding rather than any other character set, This question becomes especially important when they need to deal with national letters. Although the reasons from technical point of view it has already been known for a long time it still makes sense to repeat the merits of this universal character coding.

Well, the name itself says a lot about this character set. Unicode allows to provide the website content or software produced to other countries where people use very different letters and symbols. And if you want them to be a part of your clientele make sure they can read (and write) using your product. Apart from that Unicode allows to use characters from different languages on the same page (if we talk about web sites) or instance (if it is about software). In addition, UTF-8 make it easy to accept user-generated content (not just text comments on different languages without tweak in the website or program back-end). And the last but not least, making user or visitor toggle language encoding tags in their browsers or program settings is not a good mercy of you as a utility creator.

However, there is not just one version of UTF encoding. There are UTF-7 (but it has security issues), UTF-32 (too heavy and in most cases redundant to use). Better cases are UTF-8 and UTF-16.

Here are some basic points why UTF-8 is preferable

URLs are encoded in UTF-8. In other words, the web paths are encoded according to this system when user sends request via URL string. The path component is always encoded in UTF-8 (when the request is made) and the same is applied when a query is located inside a HTML block or script. In case you need to use different protocols and sending queries with different encoding it adds up problems on the stage of processing. That's why it is another reason why using UTF-8 is an universal approach of encoding URLs.

Also, XMLHttpRequest utilizes request strings in a text format that is in most cases the UTF-8 format on a server side. It explains why the developer should better use UTF-8 as an input environment. If everyone uses UTF-8 it will make life better and easier for both users and programmers.

Now let's have a brief look at why UTF-16 is not that recommended compared to UTF-8

The main reason why UTF-16 is not recommended to use is its vulnerability. Some search engines (including Google) ceased to serve this encoding. Little investigation, however reveals that the main reason why UTF-16 is not recommended to use by the search engine giant is int MS Internet Explorer security bugs.

Also UTF-16 requires more space to let the text in this format be allocated. At the same time, UTF-8 tends to be more western-oriented character set. And one of the most popular web browsers Mozilla Firefox announced that UTF-16 has no significant advantages over UTF-8 using in the web.

Unicode Characters: UTF-8 encoding:
U-00000000 - U-0000007F: 0xxxxxxx
U-00000080 - U-000007FF: 110xxxxx 10xxxxxx
U-00000800 - U-0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx
U-00010000 - U-001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
U-00200000 - U-03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
U-04000000 - U-7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx