Move semantics in rust, C++, and Hylo

Written on 2024-11-29 in 2480 words ✍️.
Part of cs software-development programming-languages rustlang

Motivation

I just finished reading the paper Borrow checking Hylo. It outlines the basic design of Hylo. Hylo is an early-stage programming language, but unlike other contenders of C++, it provides memory safety like rust. Specifically it merges rust’s borrow checking ideas into the framework of moveable value semantics (MVS) and linear types (LT).

I wanted to illustrate differences of C++, rust, and Hylo with one specific example.

Example

We construct an example where we …

define a record called Person with two attributes: name and age.
We create a person named Dave.
We pass the person to a function show

Then we ask us the following questions:

When we passed Dave to show, did we create a copy?
If so, how do we avoid creating a copy?
May we use Dave after passing the unique Dave (i.e. not its copy) to show?

I compiled the C++ examples with godbolt with “x86-64 gcc (trunk)” and “-Wall -Wextra -Wno-pessimizing-move -Wno-redundant-move”. I compiled the rust examples on play.rust-lang.org in edition 2021 in Debug mode. I compiled the Hylo examples on godbolt with “Hylo (trunk)”.

C++ example

#include <iostream>
#include <string>
#include <cstdint>

using namespace std;

// (1)
struct Person {
    string name;
    uint8_t age;
};

void show(Person person) {
    cout << person.name << " is " << unsigned(person.age)
         << " years old" << endl;
}

int main() {
    Person p{ "Dave", 42 }; // (2)
    show(p); // (3)
    return 0;
}

Yes. You can insert cout << "Person record is at address " << &p << endl; before the call of show as well as the beginning of show. This reveals different memory addresses of the record.
Replace void show(Person person) with void show(Person& person). So only the function needs to change. The caller does not have to adapt to it.
Yes, you may insert cout << p.name << " is " << unsigned(p.age) << " years old" << endl; after show(p);. This will compile and does not generate an error.

rust example

// (1)
struct Person {
    name: String,
    age: u8,
}

fn show(person: Person) {
    println!("{} is {} years old", person.name, person.age);
}

fn main() {
    let p = Person { name: "Dave".to_string(), age: 42 }; // (2)
    show(p); // (3)
}

No. Person does not implement the Copy trait.
(unnecessary)
No, inserting println!("{} is {} years old", p.name, p.age); after show(p); would generate the error borrow of moved value: p. Because we “moved” Dave into the scope of show, show will free the memory associated with Dave.

Hylo example

// (1)
type Person: Deinitializable {
    public var name: String
    public var age: UInt8
    public memberwise init
}

fun show(_ person: let Person) {
  print(person.name)
  print(" is ")
  print(person.name)
  print(" years old")
}

public fun main() {
  let p = Person(name: "Dave", age: 42) // (2)
  show(p) // (3)
}

I would like to point out that string interpolation is broken in the trunk version and thus I resorted to multiple calls of print.

No, all “copies are explicit by default”. Thus one has to call value.copy() in Hylo explicitly to pass a value (similar to value.clone() in rust recognizing that rust distinguishes copy & clone).
(unnecessary)
Yes, let in (_ person: let Person) provides immutable access to the value, but the lifetime of p (in main) is implicitly defined until the last use of p in main^[1].

Interlude

I think this example illustrates well that all three programming languages answer the provided questions differently. Specifically, I believe we had the following historic development:

We had pass-by-value or pass-by-pointer in languages like C. So we can either provide a copy of Dave to a function or we can provide a memory address, where the function can find Dave.
We accept that large objects should not be copied (pass-by-pointer) and small objects may be copied (pass-by-value).
We learned that working on pointers directly often leads to memory bugs. So we introduced references. We pass by pointer, but using the pointer inside the function will directly operate on Dave. It was an ergonomic advancement. The awkward thing was that the caller loses control whether Dave is passed or a copy. For the caller, it looks the same (in C++).
These days we recognize that we have more dimensions to take care of. Besides copying, we want to convey whether Dave is [im]mutable, [un]initialized, and destroyed inside the function (and thus invalid after the call).

As such we look for new designs. To distinguish these cases; we need to make them more explicit. Rust and Hylo contribute to this. Now, we can take a look at additional features in each language.

C++ example with std::move

People really struggle understanding the semantics, but std::move is the only mentionable feature of C++ in this context. The technical explanation is that an lvalue is converted into an x-value, but most people in my programming circle using that language are not even familiar with it.

#include <iostream>
#include <string>
#include <cstdint>

using namespace std;

// (1)
struct Person {
    string name;
    uint8_t age;
};

void show(Person person) {
    cout << "Person record is at address " << &person << endl;
    cout << person.name << " is " << unsigned(person.age) << " years old" << endl;
}

int main() {
    Person p{ "Dave", 42 }; // (2)
    cout << "Person record is at address " << &p << endl;
    show(move(p)); // (3)
    cout << p.name << " is " << unsigned(p.age) << " years old" << endl;
    return 0;
}

Its output is

Person record is at address 0x7ffeb9074e30
Person record is at address 0x7ffeb9074e60
Dave is 42 years old
 is 42 years old

So apparently, move does not prevent generation of a copy, but the empty string instead of expected text “Dave” is very interesting. Apparently, after termination of show after the move, the object is invalidated. This does not affect the Person object, but only the string object. Recognize that I speak about a factual behavior on the hardware. I think we have undefined behavior here. And no compilation error.

If we turn Person person (pass-by-value) into a Person& person (pass-by-reference), we get an error. When I said “the caller loses control” in the interlude before, I was not completely right. With the introduction of std::move, we regain some control. If a function expects an object by reference, a moved value is not allowed. Because a moved value indicates that the programmer is not going to use the object after this call. So what is the purpose to modify an object (this is the intention of referencing) when it is thrown away (this is the intention of move) anyways? A compilation error is luckily provided in this case:

error: cannot bind non-const lvalue reference of type 'Person&'
to an rvalue of type 'std::remove_reference<Person&>::type' {aka 'Person'}

Advanced rust example

First, we can decide to copy (in rust, here it is a clone) the object:

#[derive(Clone)] // recognize the Clone here
struct Person {
    name: String,
    age: u8,
}

fn show(person: Person) {
    println!("{} is {} years old", person.name, person.age);
}

fn main() {
    let p = Person { name: "Dave".to_string(), age: 42 };
    show(p.clone()); // recognize the clone() here
    println!("{} is {} years old", p.name, p.age);
}

Dave is 42 years old
Dave is 42 years old

Great. And what happens if I want to enable access to the object in the function, but still use it later on? I need to borrow it with &:

struct Person {
    name: String,
    age: u8,
}

fn consume(person: &Person) { // recognize ‘&’ here
    println!("{} is {} years old", person.name, person.age);
}

fn main() {
    let p = Person { name: "Dave".to_string(), age: 42 };
    consume(&p); // recognize ‘&’ here
    println!("{} is {} years old", p.name, p.age);
}

This is what is called borrowing and a compilation unit called “borrow checker” needs to check that borrowing and the use of borrowed values (like person) does not violate memory safety.

Advanced Hylo example

type Person: Deinitializable {
    public var name: String
    public var age: UInt8
    public memberwise init
}

fun show(_ person: sink Person) { // recognize “sink” here
  print(person.name)
  print(" is ")
  print(person.name)
  print(" years old")
}

public fun main() {
  let p = Person(name: "Dave", age: 42)
  show(p)
  print(p.name)
  print(" is ")
  print(p.name)
  print(" years old")
}

Changing the passing convention from let to sink tells the compiler the value is not going to be used after the call of show(p). This is obviously not true, because we call print(p.name) after it. So we get an error:

<source>:17.16: error: use of consumed object
  print(p.name)
               ^
<source>:19.16: error: use of consumed object
  print(p.name)
               ^

Similar to rust, we can copy the object and the program is going to work fine with a sink property:

type Person: Deinitializable, Copyable { // recognize “Copyable” here
    public var name: String
    public var age: UInt8
    public memberwise init
}

fun show(_ person: sink Person) {
  print(person.name)
  print(" is ")
  print(person.name)
  print(" years old")
}

public fun main() {
  let p = Person(name: "Dave", age: 42)
  show(p.copy()) // recognize copy() here
  print(p.name)
  print(" is ")
  print(p.name)
  print(" years old")
}

Conclusion

I think before rust, language designers mixed up the various properties these values can have. As a result, many incomprehensible designs were the result. rust models the most important memory-related properties through its two call conventions (passing or borrowing). And Hylo moves even more properties into the call conventions. Namely, Hylo uses the keywords let, set, sink, and inout. This way Hylo additionally represents e.g. initialization (rust models this with a separate type).

On an abstract level, a program has values which have capabilities (like copyable). In case of rust and Hylo, they are mostly represented by traits. And the programmer needs to write down what expectations a function has in terms of values it receives. And this design space of expectations gains new momentum now. What shall be distinguished and written down? What is just boilerplate and shall be ignored?

I think Hylo contributes nicely here and thus it was worth reading the paper. While writing this article, I learned that the Hylo documentation explicitly addresses differences with C++ and rust.

Finally, I want to mention that Hylo is currently in a very early development stage and it is going to take some years to get such concepts established.

Bonus: Why is a call of unsigned() required in C++?^[2].

1. rust people call this non-lexical lifetimes

2. Apparently uint8_t in C++ is printed as ASCII value (42 is “*”) and not as decimal number. unsigned() somehow changes the representation.