Intro

In my previous post, I discussed an optimization technique that requires an object to be aligned to 64 bytes using the keyword alignas(64) so that each object will reside in different cache lines. However, this may have no effect before C++17. And if you don’t take care, this may slow down the whole program.

What’s Alignment and Why Important?

Alignment is the memory address of the object is divisible by N, N here is the alignment requirement and it must be a power of 2. For example, a 32-bit integer is aligned by 4. A simple reason that alignment matters is CPUs (usually) will need more cycles to access the unaligned data. When accessing unaligned data, it’s like reading two parts and combining them together, which is a huge performance cost. Also, extra alignment is required when using some SIMD instructions.

Back to Our Problem

The C++ standard calls a type whose alignment requirement is greater than alignof(std::max_align_t) over-aligned type. This threshold can be got from the macro __STDCPP_DEFAULT_NEW_ALIGNMENT__. As quoted in cppref:

std::max_align_t is usually synonymous with the largest scalar type, which is long double on most platforms, and its alignment requirement is either 8 or 16.

If you declare a class class alignas(64) ShouldBeAligned;, it’s an over-aligned type. The problem here is there’s no defined support for the over-aligned type of dynamic memory allocation until C++17, it’s implementation-defined actually. It supports over-aligned types only on stack allocation.

You can use the following code snippet to check if an object is aligned.

template <int Alignment, typename T>
bool isAligned(T* p) {
    return reinterpret_cast<std::uintptr_t>(p) % Alignment == 0;
}

If you create an object on the stack, it’s totally fine. However, if you create it on the dynamic memory, that’s probably not what you expect.

ShouldBeAligned correct;
std::cout << isAligned<64>(&correct) << "\n"; // 1

ShouldBeAligned* wrong = new ShouldBeAligned();
std::cout << isAligned<64>(&wrong) << "\n"; // 0 or 1 implementation-defined

The optimization trick mentioned in the front might invalidate and incur false sharing. What’s even worse, there will be no warnings about this behavior unless you’re using some compilers support aligned new feature.

This is a huge miss since C11 already supports aligned_malloc, C++11 still refers to C98 though.

Solution

So what should do if you want this before C++17? I provide a less robust solution under POSIX, using posix_memalign. The core idea is to first allocate placement memory and then construct objects using placement new.

// new
void* buf;
int ret = posix_memalign (&buf, alignof(ShouldBeAligned), sizeof(ShouldBeAligned));
if (ret == 0) {
  ShouldBeAligned* correct = new(ptr) ShouldBeAligned;
}
correct->~ShouldBeAligned();
free(ret);

// new [4]
void* buf;
int ret = posix_memalign (&buf, alignof(ShouldBeAligned), sizeof(ShouldBeAligned) * 4);
if (ret == 0) {
  ShouldBeAligned* correct = new(ptr) ShouldBeAligned[4];
}
for (int i = 0; i < 4; ++i) {
  correct[i].~ShouldbeAligned();
}
free(ret);

When using containers, you’ll meet a similar problem. std::allocator doesn’t support over-aligned types pre C++17 either. This answer on SO provided a solution. You’ve to define an allocator by yourself and use it as a template parameter of the container constructor.

template <typename T, int Alignment>
class AlignedAllocator {
  // ...
  T* allocate(size_t sz);
  void deallocate(T* p, size_t n);
};

std::vector<ShouldBeAligned, AlignedAllocator<ShouldBeAligned, alignof(ShouldBeAligned)>> vec;

What’s New in C++17

All worries are gone if you start using C++17. Users can provide an extra std::align_val_t alignment parameter in the new operator. And the compilers can handle over-aligned types when directly calling new or using them in std::allocator. A corresponding memory allocation free function aligned_alloc is introduced. Check this proposal if you’re interested.

One thing worth noticing: don’t use over-aligned types unless you have to. No matter which method to allocate aligned memory, posix_memalign, add paddings in the normal malloc or something, there will be performance costs and may incur more memory fragmentation.