This post is adapted from my recent technical presentation at my company. We explored the fundamentals and modern approaches to polymorphism in C++, a powerful technique enabling flexibility and scalability in object-oriented design. This write-up aims to dive deeper into the topic, showcasing traditional virtual functions alongside modern alternatives such as the Curiously Recurring Template Pattern (CRTP) and std::variant
, and comparing their benefits and trade-offs in real-world applications.
Overview
Polymorphism usually comes in two main forms:
- Polymorphic Variable: A single symbol that can represent multiple types.
- Polymorphic Invokable: A single interface that can operate with different types.
In C++, polymorphism is essentially type-based dispatch. Let’s revisit virtual functions to see how they set the stage for polymorphic behavior.
Virtual Functions Recap
Consider a base class Animal
with a virtual function const char* Sound()
and two derived classes, Cat
and Dog
:
class Animal {
public:
virtual const char* Sound() const = 0;
virtual ~Animal() = default;
int legs;
};
class Cat : public Animal {
public:
const char* Sound() const override { return "Meow"; }
int tails;
};
class Dog : public Animal {
public:
const char* Sound() const override { return "Woof"; }
};
With virtual functions, you can create polymorphic behavior via pointers or references. A call to Sound()
on Animal*
will automatically call the correct derived class implementation.
// 1. Polymorphic Variable
const Animal& a = Cat{};
const char* sound = a.Sound();
// 2. Polymorphic Invokable
void Speak(const Animal& a) {
std::cout << a.Sound() << "\n";
}
Speak(a); // "Meow"
Under the Hood
Each class with virtual functions has a vtable (virtual table), and each instance has a vptr (virtual pointer) pointing to this vtable. The compiler sets this up to ensure calls to virtual functions go to the right function implementation.
It may look like this,
Compilers usually offer options to inspect vtable layouts:
- clang++:
-Xclang -fdump-record-layouts -Xclang -fdump-vtable-layouts
- g++:
-fdump-lang-class
For more details about how vtable and vptr work, you can refer to my previous post on this topic.
Optimizations: Using final
for Devirtualization
Marking functions as final
allows the compiler to skip the vtable lookup, potentially inlining the function for better performance.
class Base {
public:
virtual void VirtualF() = 0;
virtual void FinalF() final = 0;
};
class Derived : public Base {
public:
void VirtualF() override;
void FinalF() final override;
};
void call_virtual(Derived& d) {
d.VirtualF();
}
void call_final(Derived& d) {
d.FinalF();
}
The assembly output might look like this:
call_virtual(Derived&):
mov rax, QWORD PTR [rdi]
jmp [QWORD PTR [rax]]
call_final(Derived&):
jmp Derived::FinalF()
When calling VirtualF()
on a Derived
object, the compiler must perform a vtable lookup, as it can’t determine if it might be a subclass overrides VirtualF
. However, since FinalF
is marked as final
and cannot be overrided, the compiler generates a direct function call to it, bypassing the vtable and eliminating the cost of one indirection.
Visitor Pattern for Double Dispatch
A classic usage of virtual functions is visitor pattern
Suppose we have a ZooKeeper
responsible for feeding different animals. Here’s the example:
class ZooKeeper {
public:
void Feed(const Animal&); // feed differently
};
void ZooKeeper::Feed(const Animal& animal) {
if (dynamic_cast<const Cat*>(&animal)) {
// Feed Cat
} else if (dynamic_cast<const Dog*>(&animal)) {
// Feed Dog
}
}
You can imagine that as the number of animal types grows, multiple dynamic_cast
checks will be somehow slow and less readable.
There is where AnimalVisitor
comes in,
// forward declaration
class Cat;
class Dog;
class AnimalVisitor {
public:
virtual void Visit(const Cat&) const = 0;
virtual void Visit(const Dog&) const = 0;
};
class Animal {
public:
virtual void Accept(const AnimalVisitor&) const = 0;
};
class Cat : public Animal {
public:
void Accept(const AnimalVisitor& visitor) const override { visitor.Visit(*this); }
};
class Dog : public Animal {
public:
void Accept(const AnimalVisitor& visitor) const override { visitor.Visit(*this); }
};
Next, we implement ZooKeeper
by inheriting from AnimalVisitor
,
// Visitor
class ZooKeepVisitor : public AnimalVisitor {
public:
void Visit(const Cat& cat) const override {} // Feed a cat
void Visit(const Dog& dog) const override {} // Feed a dog
};
// Sample Usage
class ZooKeeper: public ZooKeepVisitor {
public:
void Feed(const Animal& animal) {
return animal.Accept(*this);
}
};
When ZooKeeper::Feed
is called on an Animal
object, it actually invokes two virtual functions, Animal::Accept
and ZooKeeperVistor::Visit
.
CppCon has an excellent talk about this pattern, that’s definitely worth a look.
Collection of Polymorphism Types
To store different subclasses into a single collection, you can use a structure like this,
using Zoo = std::vector<Animal*>;
void SpeakAll(const Zoo& zoo) {
for (const auto animal: zoo) {
Speak(*animal);
}
}
Virtual functions are C++’s native way to achieve polymorphism. However, due to the additional indirection through the vtable, virtual functions cannot be inlined. This extra level of indirection can impact performance in certain contexts, especially in performance-critical code, where frequent virtual calls might slow down execution. Don’t worry, we’ll talk about it later.
std::variant
Type-safe Union
C++17 Introduces std::variant
, an alternative for storing a value of one type out of several possibilities.
using Animal = std::variant<Cat, Dog>;
When using std::variant
, there are typically two ways to access the stored type:
Using index()
,
void ZooKeeper::Feed(const Animal& animal) {
switch (animal.index()) {
case 0:
// Feed Cat
break;
case 1:
// Feed Dog
break;
default:
;
}
}
Using std::visit
(recommended),
void Speak(const Animal& animal)
{
std::visit([](auto&& a) {
std::cout << a.Sound() << "\n";
}, animal);
}
class ZooKeeper {
public:
void Feed(const Animal& animal) {
std::visit(*this, animal);
}
void operator()(const Cat&) const;
void operator()(const Dog&) const;
};
Or, the overload diagram,
template<typename ... Ts>
struct Overload : Ts ... {
using Ts::operator() ...;
};
template<class... Ts> Overload(Ts...) -> Overload<Ts...>; // CTAD
// inplace visit
class ZooKeeper {
public:
void Feed(const Animal& animal) {
std::visit(Overload {
[](const Cat&) {}, // Feed Cat
[](const Dog&) {}, // Feed Dog
}, animal);
}
};
Compared to the virual-based visitor pattern, it has better performance and much simpler.
Diving into the gcc’s source code, we can see that std::visit
is implemented using a mechanism somewhat similar to vtables.
std::variant
is not the main focus in this post.
There’re several interesting discussion about performance differences between virtual functions and std::visit
. Here are a few insightful resources:
- https://www.reddit.com/r/cpp/comments/kst2pu/with_stdvariant_you_choose_either_performance_or/
- https://stackoverflow.com/questions/57726401/stdvariant-vs-inheritance-vs-other-ways-performance
Static Polymorphism
Template-based polymorphism is not something new.
class Cat {
public:
const char* Sound() const {
return "Meow";
}
};
// Polymorphic invokable?
template <typename Animal>
void Speak(const Animal& a) {
std::cout << a.Sound() << "\n";
}
CRTP Combination
Combined with CRTP(Curiously Recurring Template Pattern),
template <typename T>
class Animal {
public:
const char* Sound() const {
return static_cast<const T&>(*this).SoundImpl();
}
};
class Cat : public Animal<Cat> {
public:
const char* SoundImpl() const {
return "Meow";
}
};
class Dog : public Animal<Dog> {
public:
const char* SoundImpl() const {
return "Woof";
};
};
template <typename Animal>
void Speak(const Animal& a) {
std::cout << a.Sound() << "\n";
}
Speak(Cat()); // Animal<Cat>::Sound -> Cat::SoundImpl
Compared to virtual function, call to Speak
is determined at compile time, so it can be inlined.
Assembly might look like this,
Assembly
.LC0:
.string "Meow"
.LC1:
.string "\n"
void Speak<Cat>(Cat const&):
sub rsp, 8
mov edx, 4
mov esi, OFFSET FLAT:.LC0
mov edi, OFFSET FLAT:std::cout
call std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
mov edx, 1
mov esi, OFFSET FLAT:.LC1
mov edi, OFFSET FLAT:std::cout
add rsp, 8
jmp std::basic_ostream<char, std::char_traits<char> >& std::__ostream_insert<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*, long)
Diff with virtual functions on Godbolt
C++23 Enforcement
C++23 introduces deducing this
, which enhances the flexibility of member functions in templates. The syntax can look like this:
class Animal {
public:
template <typename Self>
const char* Sound(this Self&& self) {
return self.Sound();
}
};
class Cat : public Animal {
public:
const char* Sound() const {
return "Meow";
}
};
This making static polymorphism even more seamless.
Collection?
One approach to store different types of animals in a single collection is to use std::tuple
. Here’s how the structure might look:
template <typename... AnimalTs>
class Zoo {
public:
struct Index {
std::size_t type_index;
std::size_t container_index;
};
template <typename Animal>
Index Add(Animal&& animal);
void SpeakOne(const Index&);
void SpeakAll();
private:
std::tuple<std::vector<AnimalTs>...> animals;
};
To add a new animal to the collection:
template <typename... AnimalTs>
template <typename Animal>
typename Zoo<AnimalTs...>::Index Zoo<AnimalTs...>::Add(Animal&& animal) {
Index index;
constexpr std::size_t type_index = detail::index_of_v<std::remove_cv_t<Animal>, AnimalTs...>;
index.type_index = type_index;
index.container_index = std::get<type_index>(animals).size();
std::get<type_index>(animals).push_back(std::forward<Animal>(animal));
return index;
}
Methods to make individual animals or all animals speak:
template <typename... AnimalTs>
void Zoo<AnimalTs...>::SpeakOne(const Index& index) {
detail::invoke_at(animals, index.type_index, [container_index=index.container_index](auto& sub_animals) {
Speak(sub_animals.at(container_index));
});
}
template <typename... AnimalTs>
void Zoo<AnimalTs...>::SpeakAll() {
std::apply([](auto&&... sub_animals) {
((std::for_each(sub_animals.begin(), sub_animals.end(),
[](const auto& animal){ Speak(animal); })), ...);
}, animals);
}
For the complete code, please refer to my repository on GitHub.
Benchmark
This design is based on my practical work, where I conducted comprehensive benchmarks. I’ve simplified it and am sharing the results here for your reference.
I measured the cost of virtual functions calls, and the cost of SpeakOne
across varying type list lengths.