Author Topic: UTF-8 for Unicode  (Read 487 times)

Robert

  • Hero Member
  • *****
  • Posts: 601
    • View Profile
UTF-8 for Unicode
« on: October 13, 2020, 04:37:21 AM »
Quote
I find that using single bytes encoded as UTF-8 as the internal representation for strings is a much more powerful and elegant approach. The reason for this is that it is easier to use char-based functions in standard C and C++. Developers are usually much more familiar with functions like strcpy in C or the C++ std::string class than with the wide-character equivalents wcscpy and std::wstring, and the support for wide characters is not completely consistent in either standard.

The above is quoted from Ángel José Riesgo's blog article

Using UTF-8 as the internal representation for strings in C and C++ with Visual Studio

at

http://www.nubaria.com/en/blog/?p=289

The article is a long read but really worthwhile if you have any interest in coding beyond ASCII.

MrBcx

  • Administrator
  • Hero Member
  • *****
  • Posts: 786
    • View Profile
Re: UTF-8 for Unicode
« Reply #1 on: October 13, 2020, 06:45:03 AM »
Hi Robert,

As a practical matter, its no secret that I don't have a personal need for anything outside the realm of cp437.  That said, I visited your link and read a good bit of his thesis and was initially impressed with his ideas.  But it was the replies to his blog that I got the most amusement from, especially this one:

The good news is that GB18030 is very similar to UTF-8 (including the first 128 characters that your source code is typed in) and therefore there is no sound logical reason to not use GB18030 unless you are a racist (although I believe racism is the exact reason why you can’t run a Chinese program — menu items and all — in a typical English version of Windows).