Interpreting bits in union fields as different datatypes in C/C++

Multi tool use
Multi tool use


Interpreting bits in union fields as different datatypes in C/C++



I am trying to access Union bits as different datatype for example,


typedef union {
uint64_t x;
uint32_t y[2];
}test;

test testdata;
testdata.x = 0xa;
printf("uint64_t: %016lxnuint32_t: %08x %08xn",testdata.x,testdata.y[0],testdata.y[1]);
printf("Addresses:nuint64_t: %016lxnuint32_t: %p %pn",&testdata.x,&testdata.y[0],&testdata.y[1]);



and the output is


uint64_t: 000000000000000a
uint32_t: 0000000a 00000000
Addresses:
uint64_t: 00007ffe09d594e0
uint32_t: 0x7ffe09d594e0 0x7ffe09d594e4



The starting address pointed to by 'y' is same as starting address of 'x'; Since, both fields uses the same location shouldn't values of 'x' be 00000000 0000000a ?



Why this is not happening? How the internal conversion happens in Union with different fields of different datatypes?



If we want to retrieve the exact raw bits as uint32_t in the same order as in uint64_t using union what needs to be done?



Thank you in advance.



Edit:
As mentioned in the comments, C++ gives undefined behaviour.
In C how it works? Can we actually do it?





Comments are not for extended discussion; this conversation has been moved to chat.
– Samuel Liew
Jul 3 at 1:45




1 Answer
1



I will first explain what happens in your implementation.



You are doing type punning between an uint64_t value and an array of 2 uint32_t values. According to the result, your system is little endian and gladly accepts that type punning by simply re-interpreting the byte representations. And the byte representation of 0x0a as a little endian uint64_t is:


uint64_t


uint32_t


0x0a


uint64_t


Byte number 0 1 2 3 4 5 6 7
Value 0x0a 0x00 0x00 0x00 0x00 0x00 0x00 0x00



The least significant byte in little endian has the lowest address. It is now evident why the uint32_t[2] representation is { 0x0a, 0x00 }.


uint32_t[2]


{ 0x0a, 0x00 }



But what you are doing is only legal in C language.



C11 says as 6.5.2.3 Structure and union members:



3 A postfix expression followed by the . operator and an identifier designates a member of
a structure or union object. The value is that of the named member,95) and is an lvalue if
the first expression is an lvalue.



The 95) note says explicitly:



If the member used to read the contents of a union object is not the same as the member last used to
store a value in the object, the appropriate part of the object representation of the value is reinterpreted
as an object representation in the new type
as described in 6.2.6 (a process sometimes called ‘‘type
punning’’). This might be a trap representation.



So even if notes are not normative, their intent is to make clear the way the standard should be interpreted => you code is valid and has defined behaviour on a little endian system defining uint64_t and uint32_t types.


uint64_t


uint32_t



C++ is more strict in that part. Draft n4659 for C++17 says in [basic.lval]:



8 If a program attempts to access the stored value of an object through a glvalue of other than one of the
following types the behavior is undefined:56
(8.1) — the dynamic type of the object,
(8.2) — a cv-qualified version of the dynamic type of the object,
(8.3) — a type similar (as defined in 7.5) to the dynamic type of the object,
(8.4) — a type that is the signed or unsigned type corresponding to the dynamic type of the object,
(8.5) — a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type
of the object,
(8.6) — an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic
data members (including, recursively, an element or non-static data member of a subaggregate or
contained union),
(8.7) — a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
(8.8) — a char, unsigned char, or std::byte type.



And the note 56 says explictely:



The intent of this list is to specify those circumstances in which an object may or may not be aliased.



As punning is never referenced in C++ standard and as the struct/union part does not contain the equivalent of the re-interpretation of C, that means that reading in C++ the value of a member that is not the one that was last written invokes undefined behaviour.



Of course common compiler implementation compile both C and C++, and most of them accept the C idiom even in C++ source, for the very same reason that gcc C++ compiler gladly accepts VLA in C++ source files. After all, undefined behaviour includes expected results... But you should not rely on that for portable code.





Notably, this is one reason why C++ is unsuitable for hardware-related programming, where you often have to type pun through union either between different integer types (example: 32 bit register but CPU is 16 bit) or between some type and the byte type uint8_t (when doing any form of serialization/de-serialization). Not only does this make C++ incredibly cumbersome for hardware-related programming; this also causes C++ to invoke nasty UB bugs when you grab a hardware register map written from C.
– Lundin
Jul 2 at 8:13



uint8_t





I strongly disagree. In practice, this works on all sane compilers. C++ is the only language except C that is suitable for kernel work and hard real-time systems. And you get better, 0-cost abstractions and RAII built into the language.
– Erik Alapää
Jul 2 at 8:25





Since we are reading it as uint32_t, according to the endianness it will read 4 bytes not 2 bytes per read from LSB to MSB from 'uint64_t' right?
– Rakesh
Jul 2 at 13:36






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

HLksn4H12pi1DmTmGf L mh7el6R,0nG9s5kIBhJ4U VF 0,Pqj4nJ oC,vwHZvJuvl1HcgdbB5LtcPmISDsOPtTHSYj,5x
E,rDdSDbDZPGH tEGrDeYYTMth gXh 1qvP2x3RYLzLHa,VW,5114z

Popular posts from this blog

Rothschild family

Cinema of Italy