Lecture 2: Borrow Checker

Information below is not for the current semester.

The message of this lesson is that the Rust’s compiler is your friend. Borrowing and lifetime checking are among the borrow checking is one of the main strengths of the language.

Values

What is called “objects” in many languages including plain C, in Rust we usually call values. Contrary to many languages, these values (or objects) are conceptual, they do not have a fixed memory location.

Values don’t imply anything about their memory allocations. By default all Rust values can be moved from one memory location to another. What does that mean?

Rust values come into existence by creation. Depending on the exact way you instantiate, the newly created value gets a memory location. In the simple case you just assign your object to a local variable and get a stack memory location unless optimized out.

The stack and the heap

Your values can indeed change memory location. But that doesn’t happen randomly. A typical reason to change memory location is when a value of type T is prepared on the stack and only then gets “boxed”. Such a Box<T> is just a pointer (similar to std::unique_ptr in C++) to a heap-allocated object.

Moving from one variable to another (or even one memory location to another) stops any access to the original variable/memory. As Box<T> cannot be empty, moving the inner value out of the box (effectively out of the heap location) “consumes” the box. This is enforced by the compiler.

Moving is easier than copying or cloning and is available for all objects unless explicitly pinned. Pinning is a very advanced technique used together with unsafe code to provide features otherwise forbidden in Rust, like self-referential structs.

Complex data structures

Moving works for data structures of any complexity as long as they are not self-referential. Fields can still be “boxed” and need external heap memory allocations. A similar situation occurs with String and Vec values an more. Simple values can implement Copy (through the derive macro) as an alternative to moving. Copyable types cannot contain non-copyable fields.

When copying isn’t trivial, move is still the default implementation of value assignment. Non-trivial copying is enabled by implementing the Clone trait either using the derive macro or explicitly. Cloning is always explicit in the code.

Borrowing

All values are owned by someone. A local variable holding a value has a lifetime in the code determined by the compiler. Lifetime of a struct field is tied to that of the struct. I cannot stress enough that this is a statically computed system at the compile time.

Taking a reference to a value or part of that value is considered borrowing. You can only borrow an object that is guaranteed to exist the whole time it is borrowed. In other words the reference must be dropped before the owned value is dropped, all of this checked by the compiler.

This is a very simple concept that leads to a complex computation with partial ordering constraints expressed right in your code. When checking the correctness of your code, Rust does not look into the functions that you call. Instead it only looks at their signatures and these must exactly express what you need.

Simple lifetime constraints

Whenever you see a “method” that takes a self reference and returns a reference to a field (a typical getter method), you are a victim of lifetime elision. This is a case when Rust assigns the constraints for you and the code looks easy.

struct House {
    garrage: Garrage,
}

struct Garrage {
}

impl House {
    fn garrage(&self) -> &Garrage {
        &self.garrage
    }
}

Once you need to do anything slightly more advanced with object references, you need to learn the details, though. Rust also feels relatively forgiving with immutable references and with values that do not implement Drop. See the same code with explicit lifetimes.

fn garrage<'a>(&'a self) -> &'a Garrage {
    &self.garrage
}

In this case <'a> declares a named lifetime variable. It appears in covariant and contravariant positions. Look up these terms for broader understanding. We need to know that &'a self reads that self must exist for the whole lifetime 'a but -> &'a reads that the result must not exist outside lifetime 'a.

That means self is the one who needs to live the whole time the resulting reference lives. It suggests that the result contains a reference to some part of self and the compiler will behave exactly that way.

Stored references

When you don’t need to store one or more references e.g. in a struct, lifetime varibles tend to be pretty simple. Usually you have just one function argument that needs to be tied to the result. The rest can be simple &T references that basically translateds to &'_ T references where each use of '_ is a unique lifetime.

Once you start storing references, things get a bit more complicated.

struct City {
}

struct Person<'a> {
    city: &'a City,
}

impl<'b> Person<'b> {
    fn city<'a>(&'a self) -> &'b City
    where
        'a: 'b
    {
        self.city
    }
}

fn main() {
    let city = City {};
    let person = Person { city: &city };
}

Now the struct Person<'a> is written as a template and city: &'a City says that it contains a reference to a separately owned city. Therefore the Person<'a> can only be expressed as long as the city exists, and this needs to be enforced for all code by the compiler.

That means that impl<'b> Person<'b> implements Person for type variable 'b that can be used later in the code. Only if we don’t need to use it explicitly, it can be simplified to impl Person<'_>. In order to understand fn city you have to know that &'a self is of type &'a Self and therefore &'a Person<'b> after substitution.

The 'a: 'b just says that 'a outlives 'b. And as we know that the &'a City<'b> outlives 'a that outlives 'b that outlives the result of fn city, meaning that we can safely use the result as long as the original City<'b> exists.

Looks too complicated? We were maybe too precise and specified things that the compiler could figure out. But what if you need to store a reference to a value that already contains a reference?

struct Dog<'a, 'b> {
    owner: &'a Person<'b>,
}

impl<'b, 'c> Dog<'b, 'c> {
    fn owner<'a>(&'a self) -> &'b Person<'c>
    where
        'a: 'b,
        'b: 'c,
    {
        self.owner
    }
}

Notice that a reference to a lifetime-parametrized struct now requires two lifetimes. In many cases the compiler is very permissive and you could squash the lifetime variables into one. But when things get more complex, you might and up constraining the lifetimes too much.

This happens to mutable references rather than immutable, when Drop is implemented. Rust can often drop values at once unless you implement Drop which enforces strict ordering. You can usually comply with 'a: 'b and 'b: 'a without Drop.

Ownership

Rust ownership can be used to model object relationship only as long as it is in form of a tree where each object is a value owned by another value up to the root value that “contains” everything. Vectors or hashmaps can be used to express additional relationships. Alternatively, library tools can be used to work around the borrow checker limitations via shared ownership and interior mutability.

Those are tools that let you carefully and explicitly lift the limitations of the safety model and move some checks to the run time or even ammend them with syncronization tools. This is where Mutex or RefCell and Arc or Rc enter the game. For example, a combination of Arc and Mutex provides shared ownership and synchronized mutability. Just remember that with reference-counted smart pointers it is your responsibility to avoid memory leaks due to reference cycles.

Recapitulation

Explicit lifetimes in Rust code appear in different context. Lifetime parameters are declared so that they can be used to constrain the actual value lifetimes. They are then used to constrain lifetimes from below and above depending on the context. The constraints are used by the borrow checker to verify that references do not outlive their targets.