Motivation
I just finished reading the paper Borrow checking Hylo. It outlines the basic design of Hylo. Hylo is an early-stage programming language, but unlike other contenders of C++, it provides memory safety like rust. Specifically it merges rust’s borrow checking ideas into the framework of moveable value semantics (MVS) and linear types (LT).
I wanted to illustrate differences of C++, rust, and Hylo with one specific example.
Example
We construct an example where we …
-
define a record called
Person
with two attributes: name and age. -
We create a person named Dave.
-
We pass the person to a function
show
Then we ask us the following questions:
-
When we passed Dave to
show
, did we create a copy? -
If so, how do we avoid creating a copy?
-
May we use Dave after passing the unique Dave (i.e. not its copy) to
show
?
I compiled the C++ examples with godbolt with “x86-64 gcc (trunk)” and “-Wall -Wextra -Wno-pessimizing-move -Wno-redundant-move”. I compiled the rust examples on play.rust-lang.org in edition 2021 in Debug mode. I compiled the Hylo examples on godbolt with “Hylo (trunk)”.
C++ example
#include <iostream>
#include <string>
#include <cstdint>
using namespace std;
// (1)
struct Person {
string name;
uint8_t age;
};
void show(Person person) {
cout << person.name << " is " << unsigned(person.age)
<< " years old" << endl;
}
int main() {
Person p{ "Dave", 42 }; // (2)
show(p); // (3)
return 0;
}
-
Yes. You can insert
cout << "Person record is at address " << &p << endl;
before the call ofshow
as well as the beginning ofshow
. This reveals different memory addresses of the record. -
Replace
void show(Person person)
withvoid show(Person& person)
. So only the function needs to change. The caller does not have to adapt to it. -
Yes, you may insert
cout << p.name << " is " << unsigned(p.age) << " years old" << endl;
aftershow(p);
. This will compile and does not generate an error.
rust example
// (1)
struct Person {
name: String,
age: u8,
}
fn show(person: Person) {
println!("{} is {} years old", person.name, person.age);
}
fn main() {
let p = Person { name: "Dave".to_string(), age: 42 }; // (2)
show(p); // (3)
}
-
No.
Person
does not implement the Copy trait. -
(unnecessary)
-
No, inserting
println!("{} is {} years old", p.name, p.age);
aftershow(p);
would generate the errorborrow of moved value: p
. Because we “moved” Dave into the scope of show, show will free the memory associated with Dave.
Hylo example
// (1)
type Person: Deinitializable {
public var name: String
public var age: UInt8
public memberwise init
}
fun show(_ person: let Person) {
print(person.name)
print(" is ")
print(person.name)
print(" years old")
}
public fun main() {
let p = Person(name: "Dave", age: 42) // (2)
show(p) // (3)
}
I would like to point out that string interpolation is broken in the trunk version and thus I resorted to multiple calls of print
.
-
No, all “copies are explicit by default”. Thus one has to call
value.copy()
in Hylo explicitly to pass a value (similar tovalue.clone()
in rust recognizing that rust distinguishes copy & clone). -
(unnecessary)
-
Yes,
let
in(_ person: let Person)
provides immutable access to the value, but the lifetime ofp
(in main) is implicitly defined until the last use ofp
inmain
[1].
Interlude
I think this example illustrates well that all three programming languages answer the provided questions differently. Specifically, I believe we had the following historic development:
-
We had pass-by-value or pass-by-pointer in languages like C. So we can either provide a copy of Dave to a function or we can provide a memory address, where the function can find Dave.
-
We accept that large objects should not be copied (pass-by-pointer) and small objects may be copied (pass-by-value).
-
We learned that working on pointers directly often leads to memory bugs. So we introduced references. We pass by pointer, but using the pointer inside the function will directly operate on Dave. It was an ergonomic advancement. The awkward thing was that the caller loses control whether Dave is passed or a copy. For the caller, it looks the same (in C++).
-
These days we recognize that we have more dimensions to take care of. Besides copying, we want to convey whether Dave is [im]mutable, [un]initialized, and destroyed inside the function (and thus invalid after the call).
As such we look for new designs. To distinguish these cases; we need to make them more explicit. Rust and Hylo contribute to this. Now, we can take a look at additional features in each language.
C++ example with std::move
People really struggle understanding the semantics, but std::move is the only mentionable feature of C++ in this context. The technical explanation is that an lvalue is converted into an x-value, but most people in my programming circle using that language are not even familiar with it.
#include <iostream>
#include <string>
#include <cstdint>
using namespace std;
// (1)
struct Person {
string name;
uint8_t age;
};
void show(Person person) {
cout << "Person record is at address " << &person << endl;
cout << person.name << " is " << unsigned(person.age) << " years old" << endl;
}
int main() {
Person p{ "Dave", 42 }; // (2)
cout << "Person record is at address " << &p << endl;
show(move(p)); // (3)
cout << p.name << " is " << unsigned(p.age) << " years old" << endl;
return 0;
}
Its output is
Person record is at address 0x7ffeb9074e30 Person record is at address 0x7ffeb9074e60 Dave is 42 years old is 42 years old
So apparently, move does not prevent generation of a copy, but the empty string instead of expected text “Dave” is very interesting. Apparently, after termination of show
after the move, the object is invalidated. This does not affect the Person
object, but only the string object. Recognize that I speak about a factual behavior on the hardware. I think we have undefined behavior here. And no compilation error.
If we turn Person person
(pass-by-value) into a Person& person
(pass-by-reference), we get an error. When I said “the caller loses control” in the interlude before, I was not completely right. With the introduction of std::move
, we regain some control. If a function expects an object by reference, a moved value is not allowed. Because a moved value indicates that the programmer is not going to use the object after this call. So what is the purpose to modify an object (this is the intention of referencing) when it is thrown away (this is the intention of move) anyways? A compilation error is luckily provided in this case:
error: cannot bind non-const lvalue reference of type 'Person&' to an rvalue of type 'std::remove_reference<Person&>::type' {aka 'Person'}
Advanced rust example
First, we can decide to copy (in rust, here it is a clone) the object:
#[derive(Clone)] // recognize the Clone here
struct Person {
name: String,
age: u8,
}
fn show(person: Person) {
println!("{} is {} years old", person.name, person.age);
}
fn main() {
let p = Person { name: "Dave".to_string(), age: 42 };
show(p.clone()); // recognize the clone() here
println!("{} is {} years old", p.name, p.age);
}
Dave is 42 years old Dave is 42 years old
Great. And what happens if I want to enable access to the object in the function, but still use it later on? I need to borrow it with &
:
struct Person {
name: String,
age: u8,
}
fn consume(person: &Person) { // recognize ‘&’ here
println!("{} is {} years old", person.name, person.age);
}
fn main() {
let p = Person { name: "Dave".to_string(), age: 42 };
consume(&p); // recognize ‘&’ here
println!("{} is {} years old", p.name, p.age);
}
This is what is called borrowing and a compilation unit called “borrow checker” needs to check that borrowing and the use of borrowed values (like person
) does not violate memory safety.
Advanced Hylo example
type Person: Deinitializable {
public var name: String
public var age: UInt8
public memberwise init
}
fun show(_ person: sink Person) { // recognize “sink” here
print(person.name)
print(" is ")
print(person.name)
print(" years old")
}
public fun main() {
let p = Person(name: "Dave", age: 42)
show(p)
print(p.name)
print(" is ")
print(p.name)
print(" years old")
}
Changing the passing convention from let
to sink
tells the compiler the value is not going to be used after the call of show(p)
. This is obviously not true, because we call print(p.name)
after it. So we get an error:
<source>:17.16: error: use of consumed object print(p.name) ^ <source>:19.16: error: use of consumed object print(p.name) ^
Similar to rust, we can copy the object and the program is going to work fine with a sink
property:
type Person: Deinitializable, Copyable { // recognize “Copyable” here
public var name: String
public var age: UInt8
public memberwise init
}
fun show(_ person: sink Person) {
print(person.name)
print(" is ")
print(person.name)
print(" years old")
}
public fun main() {
let p = Person(name: "Dave", age: 42)
show(p.copy()) // recognize copy() here
print(p.name)
print(" is ")
print(p.name)
print(" years old")
}
Conclusion
I think before rust, language designers mixed up the various properties these values can have. As a result, many incomprehensible designs were the result. rust models the most important memory-related properties through its two call conventions (passing or borrowing). And Hylo moves even more properties into the call conventions. Namely, Hylo uses the keywords let
, set,
sink, and inout
. This way Hylo additionally represents e.g. initialization (rust models this with a separate type).
On an abstract level, a program has values which have capabilities (like copyable). In case of rust and Hylo, they are mostly represented by traits. And the programmer needs to write down what expectations a function has in terms of values it receives. And this design space of expectations gains new momentum now. What shall be distinguished and written down? What is just boilerplate and shall be ignored?
I think Hylo contributes nicely here and thus it was worth reading the paper. While writing this article, I learned that the Hylo documentation explicitly addresses differences with C++ and rust.
Finally, I want to mention that Hylo is currently in a very early development stage and it is going to take some years to get such concepts established.
Bonus: Why is a call of unsigned()
required in C++?[2].