C++ std vs Windows “MultiByte”

C++ std vs Windows “MultiByte”

[update]

The key question here is one of the context. Not the "how" and not "you have this API".

Thus: Why would anybody use Windows Multi Byte strings in 2018, in Win10 RS4, namely in an UWP application? For example cppWINRT does it. What is the use-case where inside UWP app, one would transform from wide string to narrow multi byte string, using the WideCharToMultiByte? Where is the result used and how?

WideCharToMultiByte

[original question]

Results of my research & development

"10 Years After" we have C++ 17, we have <codecvt> deprecated and we also have std way of converting between four (4) standard C++ string types:

<codecvt>

std

// std::string std::basic_string<char> std::wstring std::basic_string<wchar_t> std::u16string (C++11) std::basic_string<char16_t> std::u32string (C++11) std::basic_string<char32_t>

This is excluding additional four (4) 'namespace pmr` string types, which we can "abstract away" in this context.

Three point recap of the situation: C++ std vs Windows "MultiByte".

std::u16string

std::u32string

Considering above and knowing the standard C++ std:: one can code very functional and ridiculously simple but comprehensive conversions. Here is the one converting from wide to narrow string

// the "standard" version // dbj.org created 2018-07-01 // inline std::string to_string(std::wstring_view value) { if (value.empty()) return {}; return { value.begin(), value.end() }; }

I am sure the honorable audience is more than capable implementing the rest, necessary to convert all the standard types to std::string.

inline std::string to_string(std::u16string_view value) ; inline std::string to_string(std::u32string_view value) ;

And also the rest necessary to cover all the other standard conversions.
I repeat: this is using MSVC STD namespace, shipped with CL version 19.14.26431.0 (as of today 2018-07-01).

The point is: I am failing to see why would I use "Multibyte" strings on Windows in my C++ code , provided the std:: is not using it.

Please help me understand where and when one might need WideCharToMultiByte(), and its twin counterpart, today?
For the sake of completeness here is one official (cppWINrt) version not relying on msvc std:: lib.

namespace winrt { // the cppWINrt base.h inline std::string to_string(std::wstring_view value) { int const size = WideCharToMultiByte(CP_UTF8, 0, value.data(), static_cast<int32_t>(value.size()), nullptr, 0, nullptr, nullptr); if (size == 0){ return{}; } std::string result(size, '?'); WINRT_VERIFY_(size, WideCharToMultiByte(CP_UTF8, 0, value.data(), static_cast<int32_t>(value.size()), result.data(), size, nullptr, nullptr) ); return result; } }

The above (of course) does produce different std:string vs standard version, if unicode input contains chars form the extended char set.

// кошка 日本 constexpr wchar_t wide_specimen = { L"x043ax043ex0448x043ax0430 x65e5x672cx56fd" }; bool test = winrt::to_string(wide_specimen) == to_string(wide_specimen) ; // test is false test = winrt::to_string(L"Hello") == to_string(L"Hello) ; // test is true

Which way one should take? The standard way or the Windows way ...

ps: This is actually one very good text on multi byte encoding. It was part of my research.

If the standard fulfills your needs I'd prefer it for sake of portability.
– πάντα ῥεῖ
Jul 1 at 21:21

@Dusan - The Windows API goes back to the time before the first Unicode spec was published. Much of the odd parts come from a time when Windows 3 and Windows 95 used multibyte characters but Windows NT started to use Unicode. There was some utility in having a common code base and be able to convert strings at runtime. If you write new programs today, this is not something to be concerned about. Even if some APIs are still available.
– Bo Persson
Jul 1 at 21:50

Every C++ programmer eventually writes his own string class. The people that create operating systems and attend ISO meetings just did it earlier than SO users.
– Hans Passant
Jul 1 at 22:48

@BoPersson thanks for a reply. But, cppWINRT, the very latest, very modern C++ lib uses the winrt::to_string exactly as the one I copy pasted above. That is the core of the confusion. What is making them not to adopt the same philosophy as you or me in my to_string ?
– Dusan Jovanovic
Jul 2 at 22:54

winrt::to_string

to_string

2 Answers
2

char on Windows is not UTF-8, it is a (single or multi byte) codepage encoded string. These encodings come from DOS/16-bit Windows and was also the native encoding used on Windows 95/98/ME. Use WideCharToMultiByte(CP_ACP, ...) to create CHAR strings. wchar_t is usually UTF-16 LE on Windows and often a 32-bit type on POSIX, possibly UCS-4.

char

WideCharToMultiByte(CP_ACP, ...)

CHAR

wchar_t

Technically, Windows uses the CHAR and WCHAR types but the standard library/compilers use the same meaning for its char and wchar_t types.

CHAR

WCHAR

char

wchar_t

I don't know if std::string has changed in the newer versions but this is how it used to work.

std::string

Only the <char8/16/32_t> types are required to use Unicode.

<char8/16/32_t>

Even if everything is Unicode encoded, a simple binary compare might not match because Unicode codepoints can be stored in different forms. You need to normalize to precomposed if you want to match with Windows native strings.

Windows has a multi-byte codepage for UTF-8 (CP_UTF8), and char is suitable for holding UTF-8 code units, so std::string can hold UTF-8 encoded strings (in fact, this is enforced in the C++11 and later standards via the u8 literal prefix, which encodes character data to UTF-8 using char elements).
– Remy Lebeau
Jul 1 at 23:27

CP_UTF8

char

std::string

u8

char

@RemyLebeau Yes of course a char it can hold a UTF-8 code unit. It does not mean that std::string understands UTF-8 in terms of characters vs bytes and I'm sure all bets are off when you fill the buffer with bytes from a outside source. Also, I'm sure a lot of code relies on fopen(mystr.c_cstr(), ...) to call CreateFileA on Windows and only the very latest Windows 10 versions has basic support for CP_ACP == CP_UTF8.
– Anders
Jul 2 at 2:06

char

std::string

fopen(mystr.c_cstr(), ...)

CreateFileA

CP_ACP

CP_UTF8

std::string doesn't understand UTF-8 any more than std::wstring and std::u16string understand UTF-16, or std::u32string understands UTF-32. They are just containers of elements, it is up to the app to interpret their meaning. And fopen() does call CreateFileA() on Windows (_wfopen() calls CreateFileW()).
– Remy Lebeau
Jul 2 at 4:21

std::string

std::wstring

std::u16string

std::u32string

fopen()

CreateFileA()

_wfopen()

CreateFileW()

@RemyLebeau And since fopen calls CreateFileA you can't just put a UTF-8 string inside std::string and expect it to work since CreateFileA is expecting a string encoded with a SBCS/DBCS Windows codepage, not UTF-8 in 99.99% of systems.
– Anders
Jul 2 at 8:38

fopen

CreateFileA

std::string

CreateFileA

Actually, last time I looked, fopen() calls MultiByteToWideChar() and then passes the resulting wide string to CreateFileW. So ongoing lack of support for UTF-8 in the Windows CRT for this family of functions is just an unbelievable blind spot. MS should have provided an API to set the code page used by these functions ages ago, and not just continue to force us to use CP_ACP. Yuk.
– Paul Sanders
Jul 2 at 15:07

fopen()

MultiByteToWideChar()

CreateFileW

CP_ACP

Assuming the applications you are developing in c++ are compiled as native Unicode applications, you would want to use the MultiByte APIs only when reading and writing files / streams where the multibyte codepage of the file has been stipulated (or assumed) in some way.

MultiByte

i.e. Not every application on Windows is written in C++, so these APIs provide an interoperation layer for applications to pass character data around correctly.

I would not expect their existence would impose a burden on a c++ application or suite of c++ applications that prefer to use the std:: string abstractions.

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

es,yQL,lHS,JVO,4ZG1tU E1xzdYLIo15OUCgzpZK0 4S n,RKvRCpJvZGIr,nzS72l,vKmxDbtm,vZNbHuf55nsrMOe

搜尋此網誌

Gtjkyu