Chris Morgan

I’m a software developer, dealing mostly in web things.

I’m also (more importantly) a committed Christian—please talk to me at any time about that.

Elsewhere

Rust ownership, the hard way

In order to truly understand Rust, one needs to understand its crucial differentiating feature: ownership. Let’s go through it in detail the hard way.

This article is an introduction to Rust’s concept of ownership. It’s designed for someone who is already a programmer but who is not especially familiar (maybe even not at all familiar) with Rust. It doesn’t attempt to explain all the concepts it deals with, but for the most part it should be clear enough. If you’re a beginner you may also find that you don’t understand various parts of it; in such a case, you might like to hold the article while you go and learn more about them elsewhere; you’ll find more value in coming back to this article after.

This concept of ownership is the pivotal part of Rust; it’s the part that makes its combination of efficiency and safety possible. While the rest of Rust is pretty similar to what you’ll find in mainstream languages, these concepts of ownership and lifetimes are different from anything in any mainstream language, so it’s likely to be the part that you’ll spend the longest trying to grok; if you don’t give up but persist, then when it all clicks, working in Rust will be a joy, and you’ll have unlocked a marvellous ability to reason about code. You’ll also probably wish you could transplant a lot of the aspects of Rust’s ownership model to other languages you deal with.

This article deals with the concepts and theory of ownership and lifetimes; it doesn’t provide much in the way of practical usage examples; those you can find elsewhere. But it does explain all the rules that go to make Rust’s ownership model what it is. (It is called “the hard way” for a reason.)

Well, on with the first of our four rules:

  1. Each object can be used exactly once. When you use an object it is moved to the new location and is no longer usable in the old.

Later on we’ll deal with a couple of features that make this model more palatable, but for now we’ll skip them, considering it at the most basic level.

struct A;
 
fn main() {
let a = A;
let b = a;
let c = a;
}

<anon>:6:9: 6:10 error: use of moved value: `a`
<anon>:6 let c = a;
^
<anon>:5:9: 5:10 note: `a` moved here because it has type `A`, which is moved by default
<anon>:5 let b = a;
^
<anon>:5:10: 5:10 help: use `ref` to override
error: aborting due to previous error

The A instance placed in the slot a is moved to b, rendering a unusable.

  1. When an object passes out of scope, it is destroyed and is no longer usable.

All curly braces (known as blocks) introduce a new level of scope, and anything declared inside a block (e.g. the binding x in let x = y;) will live until the end of that block.

struct A;
 
fn main() {
{
let a = A;
}
let b = a;
}

This example won’t work, because a has fallen out of scope and been destroyed by the time we try to assign it to b:
<anon>:7:13: 7:14 error: unresolved name `a`
<anon>:7 let b = a;
^
error: aborting due to previous error

When an object passes out of scope and is destroyed, if the type implements Drop, that destructor will be run. I won’t get into the details of destructors here. Types like &T and &mut T (immutable and mutable references) do not have destructors.

  1. Blocks can produce a value which goes up one level of scope.

As a language feature, this is typically known as expression orientation, as distinct from statement orientation. It’s a fairly simple concept and will seem perfectly natural to users of languages like Ruby, though to users of languages like C++ and Python it may seem a little odd. (I came originally from a Python background; at first I thought it a gimmick useful only for skipping the word return on the last statement of a function, but I rapidly discovered it isn’t a gimmick at all; it’s a very useful feature, for all that it wouldn’t suit a language like Python.)

The main rule is simple: the last expression in a block is the value that the block produces. (A block is thus an expression too.) There are a couple of other rules to deal with the corner cases:

Here’s a simple example:

let a = {
// … do anything we like …
1
};

Here, a ends up storing the value 1. Given these rules, you can see that putting braces around an expression changes nothing, for at each added level the block contains only a single expression, which is then its own value. Hence, let a = { { { { { 1 } } } } }; and let a = 1; are equivalent.

In languages without this feature, the only type of block that produces a value (if you’ll allow my sloppy terminology here) is a function; there, they have the return keyword to fill in this blank. (Rust also has the return keyword to make early return more convenient, but the language would work fine without it—it’d just be harder to write some sorts of code.)

fn foo_the_c_way() -> i32 {
return 1;
}
 
fn foo_the_rust_way() -> i32 {
1
}

Because of this paradigm of expression orientation, the concept of special ternary expressions is not necessary in Rust, for an if expression can do the job just fine:

C++: different forms

auto x = a ? b : c;

sometype x;
if (a) {
// … do things …
x = b;
} else {
// … do things …
x = c;
}

Python: different forms

x = b if a else c

if a:
# … do things …
x = b
else:
# … do things …
x = c

Rust: the same form

let x = if a { b } else { c };

let x = if a {
// … do things …
b
} else {
// … do things …
c
}

  1. All objects have a lifetime which constrains which scopes they may be moved out of.

Now we’re getting into the harder stuff, the biggest thing that’s unusual about Rust: lifetimes. Syntactically, a lifetime in Rust is any identifier with the prefix of a single quotation mark, e.g. 'a. Any name will do for a lifetime, but there is one special lifetime, 'static, which means that an object contains no non-static references. Most of the primitive types (i32, bool, str, [T], &c.) are static; those that are not are:

Lifetime positions in types

There are four places where a lifetime can appear in a type:

The first two are, I believe, fairly obvious; it’s clearly unsafe to have a reference to an object that has been freed. (By baking this into the language, we avoid the problem of dangling pointers that languages like C have.)

The third is much the same, as all generic lifetime parameters will, somewhere down the way, be of one of the other types, and so it’s just a way of passing that constraint through the types, like this:

struct Ref<'a, T: 'a>(&'a T);

As you can see, this Ref type is basically just a wrapper around an immutable reference; as the contained field is of that lifetime, clearly the containing structure may not live any longer than it and so it must have that lifetime also. (The T: 'a part is a type bound, saying that the generic type T must live for at least 'a; without this it would not work, for as already discussed it clearly wouldn’t make sense to have a reference living longer than the object it refers to.)

The fourth and final of these (Trait + 'a) is, I think, the most interesting, and it’s worth spending a short time on trait objects, because the way in which they fit into this arrangement is slightly different from elsewhere.

Trait objects are Rust’s form of safe and convenient dynamic dispatch; they allow you to store arbitrary types that satisfy a trait in the one type. Because of the potential difference in size of the types implementing a trait, trait objects are only usable through a reference of some form; the owned form is thus Box<Trait>, and for references you can have &Trait and &mut Trait. This is a very brief and wholly lacking explanation of trait objects; you can find documentation of them elsewhere; a detailed explanation is out of scope for this article.

As stated, a trait object may be of a variety of different types; what, then, is the lifetime of a trait object? The answer is that we must specify it, like Box<Trait + 'static>, indicating that the contained object must be 'static, and &'a (Trait + 'a), indicating that the contained object’s lifetime must be at least 'a. Now as it happens, some of these common cases like + 'static on a boxed trait object and the duplication of the 'a in the reference are taken care of as default trait bounds—Box<Trait> is normally equivalent to Box<Trait + 'static> and &'a Trait to &'a (Trait + 'a). RFC 599, on default object bounds, treats the current rules on this subject more precisely.

(Note that just as the lifetime on a trait object can be omitted due to default object bounds, lifetimes on the other three cases can also often be omitted; the current rules of lifetime elision are covered in RFC 141.)

But still you may wonder: why do trait object need a lifetime? I think it’s easiest to just give an example of something that would be bad if it were allowed to compile (which it isn’t):

fn main() {
let trait_object: &AsRef<str> = {
let string = "I am a String".to_owned();
&string
};
println!("The string is {:?}", trait_object.as_ref());
}

By the time execution gets to the println! statement, string has been freed, so if this were allowed to compile, trait_object, which points to the contents of string, would also thus be pointing to freed memory, which is emphatically a Bad Thing™. To maintain memory safety, the boxed trait object cannot live longer than the string it contains a reference to.

Lifetime positions in generic bounds

As shown very basically above, a lifetime can also appear in type parameter bounds and lifetime parameter bounds. Here are some more examples:

struct Ref<'a, T: SomeTrait + AnotherTrait + 'a>(&'a T);
 
// Suppose we have something that is reading values from a buffer and storing
// the decoded values, which still contain references to parts of the original
// buffer, into a vector. For maximal generality, we have *two* lifetimes,
// though normally one would suffice. The reference to the vector (and all its
// items) must last at least as long as the buffer, but is not constrained to
// *only* live that long.
struct DecodeResult<'buf, 'decoded: 'buf, T: 'decoded> {
buffer: &'buf [u8],
decoded: &'decoded mut Vec<T>,
}

By type parameter bounds we mean the T: 'decoded, stating that the lifetime of the type T must be at least 'decoded.

By lifetime parameter bounds we mean the 'decoded: 'buf, stating that the lifetime 'decoded must be at least as long as (must satisfy the constraint) 'buf. This is extremely rare.

Really, these sorts of bounds are very similar to the trait object bounds: because generics can be of arbitrary types, you will often need to stipulate a minimal lifetime in order to work with a type. In the DecodeResult example above, T’s lifetime must be constrained to be at least 'decoded, or else the reference in the decoded field would not be valid. Rust could infer this sort of thing, but it would lead to surprises in various areas, so it refrains from the attempt.

Exceptions to “each object can be used exactly once”

There; we’ve covered the four basic rules. There remains still one point to explain. Earlier on I posited the simple rule that each object can be used exactly once; as I also said, this isn’t actually universally true. There are two exceptions to this rule.

  1. Copy and move semantics

Objects of types with move semantics can only be used once, but this is not true of types with copy semantics.

An object has copy semantics if it implements Copy. This means that a value of that type can be duplicated simply by copying the bytes of memory. i32, for example, is Copy because it’s just an arbitrary four bytes of data, entirely self‐contained, while Box<T> is not because it involves a heap allocation that it must own and so a simple copy of the bytes would cause two objects to own the data at the same time, breaking all sorts of invariants and probably eating your laundry.

In the first code example’s error, we saw the explanatory note “`a` moved here because it has type `A`, which is moved by default”. This moved by default is a way of saying that A has move semantics. How could we change it? Like this:

#[derive(Clone, Copy)]
struct A;
 
fn main() {
let a = A;
let b = a;
let c = a;
}

With the addition of the Copy implementation for A, changing it from having move semantics to copy semantics, the code now compiles.

There is one final matter in this exception to the rule worth mentioning: generics. If you work with a generic type, it will have move semantics unless you specify Copy in its bounds. In cases where you wish to copy the value, you might wish to use Clone instead, as it is more general (and all types that implement Copy implement Clone). Don’t worry about efficiency of cloning versus copying, either; unless the writer of the type explicitly went out of their way to do so (which they should never do), cloning and copying will perform identically when optimised.

For more information on this topic, see the Copy docs.

  1. Reference reborrowing

A particularly notable example of a type with copy semantics is &T; you can have as many immutable references to an object alive as you like, so long as there are no mutable references to it at the same time. &mut T, however, has move semantics, for you’re not allowed to have more than one mutable reference in scope at a time.

So then, why does this example work?

fn take_ref<T>(_: &T) { }
fn take_mut<T>(_: &mut T) { }
fn main() {
let mut a = 6;
{
let a_ref = &a;
take_ref(a_ref);
take_ref(a_ref);
}
{
let a_mut = &mut a;
take_mut(a_mut);
// Surely a_mut should have been consumed by the line above?
take_mut(a_mut);
}
}

There is another exception in play here: reference reborrowing. Where we pass a_mut to a function, it effectively becomes &mut *a_mut. This does mean that for a time two mutable references to the same thing exist (one in the caller and one in the function called), but at no point are both accessible, so it’s OK.

Note that this exception only applies in situations where the parameter is of the form &mut T; if it is a generic taken by value (e.g. fn<T>(x: T)) then a mutable reference you pass in will not be reborrowed automatically. You can still use the &mut *x incantation to manually reborrow it, however.

Summary

Let’s list the main points once again:

  1. Each object can be used exactly once. When you use an object it is moved to the new location and is no longer usable in the old. (Copy and automatic reference reborrowing are the two exceptions that make this more palatable.)

  2. When an object passes out of scope, it is destroyed and is no longer usable.

  3. Blocks can produce a value which goes up one level of scope. (As a language feature, this is known as expression orientation.)

  4. All objects have a lifetime which constrains which scopes they may be moved out of.

These points are all that you need to understand all of Rust’s ownership model. The rules are surprisingly simple, though you can quickly get to fairly complex interactions which make things harder to reason about. Don’t worry if you haven’t understood all of this; if you are trying to learn Rust, go and do some other things for a while and come back to the article again later and you’ll probably find it makes more sense and you can understand how ownership and lifetimes work. When you finally get to that point, you’ll love Rust’s unusual ownership model.