Does #pragma pack(n) apply to all the structures in a source code or it has to be applied to each structure separately ? It's not clear from the manual. In one place it says: "You can use #pragma pack(n) to make sure that any structures with unaligned data are packed." In other: "This pragma aligns members of a structure to the minimum of n"
It applies until any subsequent pragma pack(n) directive within the file is encountered.
Thank you, so if it will be the only one at the beginning of a module, it will apply to all the structures in a module, right ?
it will apply to all the structures in a module, right
It would indeed. Which is why you rather certainly do not want to do that. Too many people believe that #pragma pack were a good idea altogether. It's not. You're buying a minimal advantage in data size for the price of some serious problems in data handling, not the least of which is a waste of CPU cycles.
you rather certainly do not want to do that
I completely agree and trying to avoid it if possible. But let's consider the following scenario: I need to send a rather large (6KB) structure with mixed data types over the transmission channel. The structure should be reconstructed on the receiver end.
Possible solutions: 1. Extract all the data fields into continuous buffer and send it, then do the reverse operation on the receiver's end. But it's quite cumbersome. 2. Pack the structure and send it as a whole. The structure on the other end also should be packed. 3. Send the unpacked structure as a whole, and have a structure on the other end also unpacked. This is the way I would prefer. But how certain can I be that both the sent and reconstructed structures have exactly the same data alignment (both controllers are of the same type and the same toolset is used for both controllers) ?
2. Pack the structure and send it as a whole.
This is about the most popular wrong reason why people decide to pack structures.
It's wrong because you're only looking at one single aspect of that data structure's usage: sending it over the wire. But how do those data get into that structure? What else are you doing with it? In the end you'll typically pay more in terms of hassle and CPU cycles by working with an always packed structure than you would by packing the data for transport only. For starters you'll have to ensure that every single pointer ever accessing any element of that packed structure has the "packed" attribute, too. This tends to spread the hassle and waste of CPU all across the program.
And that's before you consider other transformations you may have to do at the interface to the outside world anyway (endianness, floating point format, bit field layout, ...). Those would kill the idea of "send it as it is" anyway, and remove what little advantage a packed structure might seem to have.
The morale: serialization is not a valid rationale for packing structures.
serialization is not a valid rationale for packing structures
thank you. i agree. And what is your opinion about the assumption that structures on both ends will have the same data alignment ?
"In the end you'll typically pay more in terms of hassle and CPU cycles by working with an always packed structure than you would by packing the data for transport only"
Been there; done that - never again!
"serialization is not a valid rationale for packing structures"
Agreed.
That is not an assumption that you should ever even think about making!
Even if you do happen to have a case where it is known (not assumed) that the two ends do just happen to have the same data alignment, it is probably still better to design as if they didn't.
If you rely upon it now, it is bound to change - and it will certainly be much harder to redesign everything retrospectively than it would have been to just do it "right" in the first place.
And, again, remember that alignment is not the only issue...
the assumption that structures on both ends will have the same data alignment
That assumption should alsways be assumed to be wrong.
In a nutshell, C data structures are for internal use of the C program. They are generally unsuitable to define any external interface data, and any attempts to abuse them causes problems you just don't need.
There's a reason no self-respecting data interchange format is defined in terms of C structs: it simply wouldn't work. Those things are defined in terms of bits, octets and similarly universal items for a reason.
Sorry for my limited technical ability and English ability, but I still hope to take this opportunity to learn more.
I did something stupid long ago.
struct UART_Packet { char Start1; char Start2; char FromAddress; char ToAddress; char String[PACKET_SIZE-10]; int CRC32; char End1; char End2; }; union { struct UART_Packet PacketBUF; char StringBUF[PACKET_SIZE]; } UART_Unit_RX, UART_Unit_TX;
And encountered some data align problem. So I changed my struct to:
struct UART_Packet { char Start1; char Start2; char FromAddress; char ToAddress; char String[PACKET_SIZE-10]; char CRC32[4]; char End1; char End2; }; union { struct UART_Packet PacketBUF; char StringBUF[PACKET_SIZE]; } UART_Unit_RX, UART_Unit_TX;
It has been working well (?), since then.
After reading the thread, I realized that, there might be some more problems hidden. So I tried to find a better packet format definition defined in C language. I found the following links, and got very confused.
en.wikipedia.org/.../Tcphdr en.wikipedia.org/.../Ip_(struct)
How do we use these popular packet format definitions properly? I mean, how to handle the data align issue, and other issues that I don't really understand, with such C struct definitions.
struct tcphdr { unsigned short source; unsigned short dest; unsigned long seq; unsigned long ack_seq; # if __BYTE_ORDER == __LITTLE_ENDIAN unsigned short res1:4; unsigned short doff:4; unsigned short fin:1; unsigned short syn:1; unsigned short rst:1; unsigned short psh:1; unsigned short ack:1; unsigned short urg:1; unsigned short res2:2; # elif __BYTE_ORDER == __BIG_ENDIAN unsigned short doff:4; unsigned short res1:4; unsigned short res2:2; unsigned short urg:1; unsigned short ack:1; unsigned short psh:1; unsigned short rst:1; unsigned short syn:1; unsigned short fin:1; # endif unsigned short window; unsigned short check; unsigned short urg_ptr; };
Note that this is yet one other problem with raw binary transfer that doesn't have anything to do with align, padding, ... of structures.
In your example, they have problems with bit fields. They did make sure that they consumed all 16 bits just to avoid the problem with where the compiler would place any padding. But since the bit field container was larger than one byte, different architectures would still manage to have different byte orders and hence introduce a difference in the order of the bit fields.
So: 1) Compiler vendors can get away with it, but no one else should really try to transfer bit field containers larger than 8 bits as raw containers between different architectures. Bit fields can work quite well internally, but they were never intended to be shared.
2) As already noted: it's way better to write code to pack/unpack structures than to fight with manual padding, pragmas etc. Besides the padding issues, there is also the question of byte orders for different data types. And the fact that many of the classical C data types can be different size on different architectures. Today, most machines have the "standard" two-complement format for integers, allowing us simple pack/unpack for integers. But it gets worse for floating point - especially when some targets emulates floating point and then maybe don't use the ISO-standardized formats for float and double.
Hi Per,
Many thanks for your reply.
How about the below example? en.wikipedia.org/.../Ip_(struct)
struct ip { unsigned int ip_hl:4; /* both fields are 4 bits */ unsigned int ip_v:4; uint8_t ip_tos; uint16_t ip_len; uint16_t ip_id; uint16_t ip_off; uint8_t ip_ttl; uint8_t ip_p; uint16_t ip_sum; struct in_addr ip_src; struct in_addr ip_dst; };
I remember that, all of the protocol layers in the TCP/IP suite are defined to be big endian.
But I just can't understand how the struct ip handles data alignment problem.
The protocol layer may specify big endian. And not only that - the standard does define the location of the different bits.
For a normal PC-class program, that normally just means that an IPv4 address stored in a 32-bit integer must be processed with htonl() and ntohl() to make sure that the number ends up as the expected <n>.<n>.<n>.<n>.
But in the end, it's up to the driver layer and the compiler/library to make sure that you get valid structure data.
In your example, ip_hl and ip_v fits in a single byte. So if you look, everything but the 8-bit fields are already 16-bit aligned. And there are 12 bytes when you reach the two structures so they are 32-bit aligned.
The network card sends out the data byte-by-byte. So an 8-bit field isn't a problem to send and receive on the other end. It's fields that are larger than a single byte that are problematic since the network card can send them out byte-by-byte but the meaning of the high and low bytes can be swapped. That is why the standard have defined a "network byte order" and you have functions to perform conditional byte-reversals of 16-bit and 32-bit integers. So the code calls these functions without knowing if any byte-reversal will take place or not - the runtime library will do the required work.
But no byte swap is needed for an array of bytes - and your examples can be seen as two arrays of two bytes each.
It doesn't!
When used internally - within a program - there is no data alignment problem: the compiler will consistently use the same alignment rules, so there is no problem!
The problem is that you are trying to use this structure to represent the format of data sent externally to the program - on the communication medium. So the solution to this is simple: don't do that!
It's been mentioned before: the process of taking the information out of the program's internal representation (with its specific padding, alignment, byte-order, data representation, etc) and turning it into the required external format for transmission on the communication medium is known as Serialisation
en.wikipedia.org/.../Serialization
Hans-Bernhard Broeker said,
The reverse process - receiving data from the communication medium and formatting it into the system's own internal representation - is, naturally enough, referred to a Deserialisation
"It doesn't!"
For external transmission on the communication medium, it also doesn't cover the issues of byte ording, data representation, etc,...
So, one more time: