Virtual Functions
While learning Java, I came across this line of code:
As someone who knows C++, this is somewhat confusing –
List
is an interface that ArrayList
implements. But if we treat ArrayList
as a list, then when
we call List.add()
we must use dynamic dispatch to find the
right implementation, since the implementation of add
isn’t
on the list interface.
That involves a virtual
function lookup.
In Java, most function calls are virtual
, unless a
method is declared as final
(which means it cannot be
overriden).
By doing this, we get an advantage – if we decide to change our
array
variable from an ArrayList
to another
List
interface conforming type, we can do so without
breaking any code.
In exchange, we cannot use any ArrayList
specific
methods without casting to ArrayList
, which nullifies this
benefit.
We have information that both ArrayList and LinkedList implement the List interface, so we can treat them as such in a collection.
ArrayList<List> lists = new ArrayList<List>();
lists.add(new ArrayList());
lists.add(new LinkedList());
for (List<Integer> l : lists) {
l.add(10); // defer to each lists' implementation for add
}
Since we know that Java uses virtual functions for interface
implementations, everything works as expected. l.add(10)
defers to the implementation of ArrayList
and
LinkedList
, and each list is added a 10
.
A full example of virtual functions might look something like this:
public class Animal {
void move() {
System.out.printf("walk %s\n", this.name());
}
String name() {
return "Animal";
}
public static void main(String[] args) {
Animal animals[] = {new Animal(), new Bird(), new Elephant()};
for (Animal animal : animals) {
animal.move();
}
}
}
class Bird extends Animal {
@Override
void move() {
System.out.printf("fly %s\n", this.name());
}
@Override
String name() {
return "Bird";
}
}
class Elephant extends Animal {
@Override
String name() {
return "Elephant";
}
}
Running this prints this out:
Here we create a base class, Animal
, which has two
methods, name
and move
. This class is
instantiable because it has default implementations for both methods.
The Bird class overrides the move method, since it flies instead of
walks, and the Elephant only overrides the name. When we collect them
into an array and call the move()
method on each animal as
an animal, Java does the virtual function lookup and calls the correct
overriden method as we expect.
It turns out that not every language does this, mainly because there is a runtime cost in dynamic dispatch.
In C++, you must designate a function as virtual
which
labels a function as overridable by an inherited class.
A roughly word for word translation of the above java program would look like this:
#include <stdio.h>
struct Animal {
virtual const char *name() const { return "Animal"; }
virtual void move() const { printf("walk %s\n", name()); }
virtual ~Animal() {}
};
struct Bird : public Animal {
virtual void move() const override { printf("fly %s\n", name()); }
virtual const char *name() const override { return "Bird"; }
};
struct Elephant : public Animal {
virtual const char *name() const override { return "Elephant"; }
};
int main(void) {
const Animal *animals[] = {new Animal(), new Bird(), new Elephant()};
for (const auto &animal : animals) {
animal->move();
}
}
This returns:
In the Java version it isn’t apparent to the programmer that there is
a runtime cost, but C++ puts it front and center. Since we used the
new
keyword, everything is placed on the heap. When we try
to call the move()
method, we have to do it with the
->
operator, which calls an object on the heap, rather
than on the stack.
Let’s say we don’t use the new
operator, and don’t use a
pointer lookup:
int main(void) {
const Animal animals[] = {Animal(), Bird(), Elephant()};
for (const auto &animal : animals) {
animal.move();
}
}
This prints out:
Since we are literally treating each animal as an Animal
type, animal.move()
calls the default Animal
implementation of move. If we choose to have no runtime cost, we get
less desirable behavior. But C++ gives you the choice upfront.
Let’s dig deeper.
C doesn’t have any language support for virtual
functions, but we can still emulate them.
A (very) rough translation of the java program might look like this:
#include <stdio.h>
typedef struct Animal {
const char *name;
void (*move)(const struct Animal *);
} Animal_t;
void fly(const Animal_t *a) { printf("fly %s\n", a->name); }
void walk(const Animal_t *a) { printf("walk %s\n", a->name); }
void animal_move(const Animal_t *a) { a->move ? a->move(a) : walk(a); }
int main(void) {
const Animal_t animals[] = {
{.name = "Animal"}, {.name = "Bird", .move = fly}, {.name = "Elephant"}};
const size_t animals_len = sizeof(animals) / sizeof(Animal_t);
for (int i = 0; i < animals_len; i++) {
const Animal_t animal = animals[i];
animal_move(&animal);
}
}
C actually lets us do some interesting things here: It allows us to
allocate our animals
on the stack. As well, we create an
animal_move
function that asks the struct passed in if it
has a function pointer for move
. If it does, then it defers
to that, otherwise it calls a default implementation. If we do choose to
use a more specific version of move
, then we do have the
pointer lookup cost, but if not, there is no cost.
Predictably, this prints:
Digging a little up, we find that Rust has a similar concept, but it
does away with using keywords like virtual
or
final
to designate dynamic dispatch.
pub trait Animal {
fn name(&self) -> String {
"Animal".to_string()
}
fn act(&self) {
println!("walk {}", self.name());
}
}
struct GenericAnimal {}
impl Animal for GenericAnimal {}
struct Bird {}
impl Animal for Bird {
fn name(&self) -> String {
"Bird".to_string()
}
fn act(&self) {
println!("fly {}", self.name());
}
}
struct Elephant {}
impl Animal for Elephant {
fn name(&self) -> String {
"Elephant".to_string()
}
}
fn main() {
let animals: Vec<&dyn Animal> = vec![&GenericAnimal{}, &Bird{}, &Elephant{}];
for animal in animals {
animal.act();
}
}
We create a trait
of animal (kind of like an interface,
but supercharged) and then we create an instantiable version of it
(GenericAnimal
) that just takes the default implementation.
Then we implement our Bird
and Elephant
and we
collect them into a vector and properly call the method on them. The
compiler assumes that we want dynamic dispatch by default and does it
for us. If not, we can call the parent classes’ method:
This is similar to C++:
In short, here’s a history of virtual functions and their syntax, starting from C and ending with Rust:
In C, there’s a rough way to emulate virtuals, but it takes some effort since it’s not built into the language.
In C++, virtual functions were deemed useful enough to be built into
the language. This led to a terser syntax for overriding, but led to
more cognitive load on the programmer, since they had to choose
virtual
or non-virtual implementations.
In Java, most methods are by default virtual, so the implementation
details are hidden from the programmer. Java allows you to declare
non-overridable methods as final
, so final
methods have no runtime cost, and all @Override
methods
have runtime cost, which is a fair tradeoff.
In Rust, the compiler figures out if you want a virtual
or normal method call through your trait implementations, but allows you
to call the base method through a subclass if you so choose. This allows
for having even less cognitive load than in Java (no
@Override
or final
necessary), but with an
escape hatch to call the base class method in an overriden type (as C++
allows).