Align structures 8 byte for 64-bit platforms
This PR will decrease costs copying, moving, and creating object-structures only for common 64bit processors due to the 8-byte data alignment.
Smaller size structure or class, higher chance putting into CPU cache. Most processors are already 64 bit, so the change won't make it any worse.
Pahole example:
- Comment
/* XXX {n} bytes hole, try to pack */
shows where optimization is possible by rearranging the order of fields structures and classes
Master branch
struct ScreenManager_ {
int x1; /* 0 4 */
int y1; /* 4 4 */
int x2; /* 8 4 */
int y2; /* 12 4 */
Vector * panels; /* 16 8 */
const char * name; /* 24 8 */
int panelCount; /* 32 4 */
/* XXX 4 bytes hole, try to pack */
Header * header; /* 40 8 */
Machine * host; /* 48 8 */
State * state; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
_Bool allowFocusChange; /* 64 1 */
/* size: 72, cachelines: 2, members: 11 */
/* sum members: 61, holes: 1, sum holes: 4 */
/* padding: 7 */
/* last cacheline: 8 bytes */
};
This PR
struct ScreenManager_ {
int x1; /* 0 4 */
int y1; /* 4 4 */
int x2; /* 8 4 */
int y2; /* 12 4 */
_Bool allowFocusChange; /* 16 1 */
/* XXX 3 bytes hole, try to pack */
int panelCount; /* 20 4 */
Vector * panels; /* 24 8 */
const char * name; /* 32 8 */
Header * header; /* 40 8 */
Machine * host; /* 48 8 */
State * state; /* 56 8 */
/* size: 64, cachelines: 1, members: 11 */
/* sum members: 61, holes: 1, sum holes: 3 */
};
Info about technique:
https://hpc.rz.rptu.de/Tutorials/AVX/alignment.shtml
https://en.wikipedia.org/wiki/Data_structure_alignment
https://stackoverflow.com/a/20882083
https://zijishi.xyz/post/optimization-technique/learning-to-use-data-alignment/
Affected structs:
- ScreenManager 72 to 64 bytes
- Screen/DynamicIterator 24 to 16 bytes
- Meter/DynamicIterator 24 to 16 bytes
- IncMode 152 to 144 bytes
- TraceScreen 64 to 56 bytes
- LinuxProcessTable 120 to 112 bytes
- FunctionBar 40 to 32 bytes
Edited by Herman Semenov