This post is adapted from my recent technical presentation at my company. We explored the fundamentals and modern approaches to polymorphism in C++, a powerful technique enabling flexibility and scalability in object-oriented design. This write-up aims to dive deeper into the topic, showcasing traditional virtual functions alongside modern alternatives such as the Curiously Recurring Template Pattern (CRTP) and std::variant, and comparing their benefits and trade-offs in real-world applications.

Overview

Polymorphism usually comes in two main forms:

  • Polymorphic Variable: A single symbol that can represent multiple types.
  • Polymorphic Invokable: A single interface that can operate with different types.

In C++, polymorphism is essentially type-based dispatch. Let’s revisit virtual functions to see how they set the stage for polymorphic behavior.

Virtual Functions Recap

Consider a base class Animal with a virtual function const char* Sound() and two derived classes, Cat and Dog:

class Animal {
public:
    virtual const char* Sound() const = 0;
    virtual ~Animal() = default;
    int legs;
};

class Cat : public Animal {
public:
    const char* Sound() const override { return "Meow"; }
    int tails;
};

class Dog : public Animal {
public:
    const char* Sound() const override { return "Woof"; }
};

With virtual functions, you can create polymorphic behavior via pointers or references. A call to Sound() on Animal* will automatically call the correct derived class implementation.

// 1. Polymorphic Variable
const Animal& a = Cat{};
const char* sound = a.Sound();

// 2. Polymorphic Invokable
void Speak(const Animal& a) {
    std::cout << a.Sound() << "\n";
}
Speak(a); // "Meow"

Under the Hood

Each class with virtual functions has a vtable (virtual table), and each instance has a vptr (virtual pointer) pointing to this vtable. The compiler sets this up to ensure calls to virtual functions go to the right function implementation.

It may look like this,

Compilers usually offer options to inspect vtable layouts:

  • clang++: -Xclang -fdump-record-layouts -Xclang -fdump-vtable-layouts
  • g++: -fdump-lang-class

Demo on Godbolt

For more details about how vtable and vptr work, you can refer to my previous post on this topic.

Optimizations: Using final for Devirtualization

Marking functions as final allows the compiler to skip the vtable lookup, potentially inlining the function for better performance.

class Base {
public:
    virtual void VirtualF() = 0;
    virtual void FinalF() final = 0;
};

class Derived : public Base {
public:
    void VirtualF() override;
    void FinalF() final override;
};

void call_virtual(Derived& d) {
    d.VirtualF();
}

void call_final(Derived& d) {
    d.FinalF();
}

The assembly output might look like this:

call_virtual(Derived&):
        mov     rax, QWORD PTR [rdi]
        jmp     [QWORD PTR [rax]]
call_final(Derived&):
        jmp     Derived::FinalF()

When calling VirtualF() on a Derived object, the compiler must perform a vtable lookup, as it can’t determine if it might be a subclass overrides VirtualF. However, since FinalF is marked as final and cannot be overrided, the compiler generates a direct function call to it, bypassing the vtable and eliminating the cost of one indirection.

Demo

Visitor Pattern for Double Dispatch

A classic usage of virtual functions is visitor pattern

Suppose we have a ZooKeeper responsible for feeding different animals. Here’s the example:

class ZooKeeper {
public:
    void Feed(const Animal&); // feed differently
};

void ZooKeeper::Feed(const Animal& animal) {
    if (dynamic_cast<const Cat*>(&animal)) {
        // Feed Cat
    } else if (dynamic_cast<const Dog*>(&animal)) {
        // Feed Dog
    }
}

You can imagine that as the number of animal types grows, multiple dynamic_cast checks will be somehow slow and less readable.

There is where AnimalVisitor comes in,

// forward declaration
class Cat;
class Dog;

class AnimalVisitor {
public:
    virtual void Visit(const Cat&) const = 0;
    virtual void Visit(const Dog&) const = 0;
};

class Animal {
public:
    virtual void Accept(const AnimalVisitor&) const = 0;
};

class Cat : public Animal {
public:
    void Accept(const AnimalVisitor& visitor) const override { visitor.Visit(*this); }
};

class Dog : public Animal {
public:
    void Accept(const AnimalVisitor& visitor) const override { visitor.Visit(*this); }
};

Next, we implement ZooKeeper by inheriting from AnimalVisitor,

// Visitor
class ZooKeepVisitor : public AnimalVisitor {
public:
    void Visit(const Cat& cat) const override {} // Feed a cat
    void Visit(const Dog& dog) const override {} // Feed a dog
};

// Sample Usage
class ZooKeeper: public ZooKeepVisitor {
public:
    void Feed(const Animal& animal) {
        return animal.Accept(*this);
    }
};

When ZooKeeper::Feed is called on an Animal object, it actually invokes two virtual functions, Animal::Accept and ZooKeeperVistor::Visit. CppCon has an excellent talk about this pattern, that’s definitely worth a look.

Collection of Polymorphism Types

To store different subclasses into a single collection, you can use a structure like this,

using Zoo = std::vector<Animal*>;
void SpeakAll(const Zoo& zoo) {
    for (const auto animal: zoo) {
        Speak(*animal);
    }
}

Virtual functions are C++’s native way to achieve polymorphism. However, due to the additional indirection through the vtable, virtual functions cannot be inlined. This extra level of indirection can impact performance in certain contexts, especially in performance-critical code, where frequent virtual calls might slow down execution. Don’t worry, we’ll talk about it later.

std::variant Type-safe Union

C++17 Introduces std::variant, an alternative for storing a value of one type out of several possibilities.

using Animal = std::variant<Cat, Dog>;

When using std::variant, there are typically two ways to access the stored type:

Using index(),

void ZooKeeper::Feed(const Animal& animal) {
    switch (animal.index()) {
        case 0:
            // Feed Cat
            break;
        case 1:
            // Feed Dog
            break;
        default:
            ;
    }
}

Using std::visit (recommended),

void Speak(const Animal& animal)
{
   std::visit([](auto&& a) { 
       std::cout << a.Sound() << "\n"; 
   }, animal);
}

class ZooKeeper {
public:
    void Feed(const Animal& animal) {
        std::visit(*this, animal);
    }
    void operator()(const Cat&) const;
    void operator()(const Dog&) const;
};

Or, the overload diagram,

template<typename ... Ts>
struct Overload : Ts ... { 
    using Ts::operator() ...;
};
template<class... Ts> Overload(Ts...) -> Overload<Ts...>; // CTAD

// inplace visit
class ZooKeeper {
public:
    void Feed(const Animal& animal) {
        std::visit(Overload {
            [](const Cat&) {}, // Feed Cat 
            [](const Dog&) {}, // Feed Dog 
        }, animal);
    }
};

Compared to the virual-based visitor pattern, it has better performance and much simpler.

Diving into the gcc’s source code, we can see that std::visit is implemented using a mechanism somewhat similar to vtables.

std::variant is not the main focus in this post. There’re several interesting discussion about performance differences between virtual functions and std::visit. Here are a few insightful resources:

Static Polymorphism

Template-based polymorphism is not something new.

class Cat {
public:
    const char* Sound() const {
        return "Meow";
    }
};

// Polymorphic invokable?
template <typename Animal>
void Speak(const Animal& a) {
    std::cout << a.Sound() << "\n";
}

CRTP Combination

Combined with CRTP(Curiously Recurring Template Pattern),

template <typename T>
class Animal {
public:
    const char* Sound() const {
        return static_cast<const T&>(*this).SoundImpl();
    }
};

class Cat : public Animal<Cat> {
public:
    const char* SoundImpl() const {
        return "Meow";
    }
};

class Dog : public Animal<Dog> {
public:
    const char* SoundImpl() const {
        return "Woof";
    };
};

template <typename Animal>
void Speak(const Animal& a) {
    std::cout << a.Sound() << "\n";
}

Speak(Cat()); // Animal<Cat>::Sound -> Cat::SoundImpl

Compared to virtual function, call to Speak is determined at compile time, so it can be inlined.

Assembly might look like this,

Assembly
.LC0:
        .string "Meow"
.LC1:
        .string "\n"
void Speak<Cat>(Cat const&):
        sub     rsp, 8
        mov     edx, 4
        mov     esi, OFFSET FLAT:.LC0
        mov     edi, OFFSET FLAT:std::cout
        call    std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
        mov     edx, 1
        mov     esi, OFFSET FLAT:.LC1
        mov     edi, OFFSET FLAT:std::cout
        add     rsp, 8
        jmp     std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)

Diff with virtual functions on Godbolt

C++23 Enforcement

C++23 introduces deducing this, which enhances the flexibility of member functions in templates. The syntax can look like this:

class Animal {
public:
    template <typename Self>
    const char* Sound(this Self&& self) {
        return self.Sound();
    }
};

class Cat : public Animal {
public:
    const char* Sound() const {
        return "Meow";
    }
};

This making static polymorphism even more seamless.

Collection?

One approach to store different types of animals in a single collection is to use std::tuple. Here’s how the structure might look:

template <typename... AnimalTs>
class Zoo {
public:
    struct Index {
        std::size_t type_index;
        std::size_t container_index;
    };

    template <typename Animal>
    Index Add(Animal&& animal);

    void SpeakOne(const Index&);

    void SpeakAll();
private:
    std::tuple<std::vector<AnimalTs>...> animals;
};

To add a new animal to the collection:

template <typename... AnimalTs>
template <typename Animal>
typename Zoo<AnimalTs...>::Index Zoo<AnimalTs...>::Add(Animal&& animal) {
    Index index;
    constexpr std::size_t type_index = detail::index_of_v<std::remove_cv_t<Animal>, AnimalTs...>;
    index.type_index = type_index;
    index.container_index = std::get<type_index>(animals).size();
    std::get<type_index>(animals).push_back(std::forward<Animal>(animal));
    return index;
}

Methods to make individual animals or all animals speak:

template <typename... AnimalTs>
void Zoo<AnimalTs...>::SpeakOne(const Index& index) {
    detail::invoke_at(animals, index.type_index, [container_index=index.container_index](auto& sub_animals) {
        Speak(sub_animals.at(container_index));
    });
}

template <typename... AnimalTs>
void Zoo<AnimalTs...>::SpeakAll() {
    std::apply([](auto&&... sub_animals) {
        ((std::for_each(sub_animals.begin(), sub_animals.end(), 
        [](const auto& animal){ Speak(animal); })), ...);
    }, animals);
}

Demo on Godbolt

For the complete code, please refer to my repository on GitHub.

Benchmark

This design is based on my practical work, where I conducted comprehensive benchmarks. I’ve simplified it and am sharing the results here for your reference.

I measured the cost of virtual functions calls, and the cost of SpeakOne across varying type list lengths.

Benchmark result

Reference