Lecture 5: Concurrency and Parallelism

Rust’s data model is built around the idea of safe concurrency. Some things and limitations may not even make sense to you as long as you think in terms of a single-threaded application. Therefore a Rust course wouldn’t be complete without discussing concurrency and parallelism. The good news is that writing threaded applications in Rust is so much easier.

Let us look at the definition of a data race. A data race occurs when multiple threads of execution attempt to address a value an at least one of them is modifying it. That means you can share immutable values without limitation but mutable access must be exclusive. No other thread can safely read or write the data.

Does it remind you of something? Yes, that’s exactly the shared immutable borrows versus exclusive mutable borrows dichotomy. It was somewhat useful in a single-threaded scenario but maybe not useful enough to care.

Multi-threaded data model

Global variables are discouraged in Rust. Unlike C++, it only supports compile-time initialization. It can do some magic via lazy initialization that is deferred to the first use of the global object. Passing localy created things by value or using a smart pointer like Box or Arc is the preferred way.

There are multiple ways to add concurrency to your Rust application. One is to create threads using the standard library. Another is using a scheduler library like tokio that uses a thread pool to distribute tasks. Tasks are different from threads. They use coroutines instead of plain functions and are extremely cheap to create and schedule.

You can pass things to newly created threads or tasks by value or by a shared reference using Arc. You can share mutable data using Mutex and pass that using Arc. You can get access to the data by locking the mutex. You cannot access it without locking. Rust guarantees race-free use of shared mutable data in both cases.

A more modern approach is to share a message queue. You can use channels from std::sync::mpsc or other solutions. That way you can pass things between threads. You can send things either by value or by a Box owning smart pointer. You can share things via Arc smart pointer.

When you use tokio you typically want to use its own implementation of queues and also its I/O subsystem.

Simple threads with moved values

Threads and closures can be used together to from simple threaded functions with input and output values. Once the function is called, the thread is started and computation (or just waiting in our example) begins. The caller can wait and fetch the resulting value using .join().

use std::{thread, time::Duration, time::SystemTime};

fn delayed_value(sleep: u64, value: i32) -> thread::JoinHandle<i32> {
    thread::spawn(move || {
        thread::sleep(Duration::from_secs(sleep));
        value
    })
}

fn main() {
    println!("{:?}", SystemTime::now());
    let result = delayed_value(2, 42).join().unwrap();
    println!("{:?}", SystemTime::now());
    println!("{}", result)
}

It is easy to run multiple threads in parallel and then wait for the results of all of them at once.

fn main() {
    println!("{:?}", SystemTime::now());
    let thread1 = delayed_value(3, 42);
    let thread2 = delayed_value(2, 43);
    let result1 = thread1.join().unwrap();
    let result2 = thread2.join().unwrap();
    println!("{:?}", SystemTime::now());
    println!("{} {}", result1, result2)
}

What if you want to pass a common input value? In the following snippet you can only pass the value to two different functions because i32 is a Copy type and therefore its value gets copied to each function’s value argument.

fn main() {
    let value = 42;
    println!("{:?}", SystemTime::now());
    let thread1 = delayed_value(3, value);
    let thread2 = delayed_value(2, value);
    let result1 = thread1.join().unwrap();
    let result2 = thread2.join().unwrap();
    println!("{:?}", SystemTime::now());
    println!("{} {}", result1, result2)
}

You cannot do the same with non-Copy types like String. But you can still copy a string using .clone() which performs a deep copy of the data structure. Trivial types use memory-based Copy, more complex type provide a Clone implementation.

use std::{thread, time::Duration, time::SystemTime};

fn delayed_value(sleep: u64, value: String) -> thread::JoinHandle<String> {
    thread::spawn(move || {
        thread::sleep(Duration::from_secs(sleep));
        format!("Answer is {}.", value)
    })
}

fn main() {
    let value = "42".to_string();
    println!("{:?}", SystemTime::now());
    let thread1 = delayed_value(3, value.clone());
    let thread2 = delayed_value(2, value.clone());
    let result1 = thread1.join().unwrap();
    let result2 = thread2.join().unwrap();
    println!("{:?}", SystemTime::now());
    println!("{} {}", result1, result2)
}

Shared immutable data

What if you need to share a data structure rather than clone it? Maybe it’s large, maybe you’ll need to implement shared mutation later. If instead of cloning a String you clone an Arc<String>, the string itself is created only once and each .clone() just increments its reference count.

use std::{thread, time::Duration, time::SystemTime};
use std::sync::Arc;

fn delayed_value(sleep: u64, value: Arc<String>) -> thread::JoinHandle<String> {
    thread::spawn(move || {
        thread::sleep(Duration::from_secs(sleep));
        format!("Answer is {}.", value)
    })
}

fn main() {
    let value = Arc::new("42".to_string());
    println!("{:?}", SystemTime::now());
    let thread1 = delayed_value(3, value.clone());
    let thread2 = delayed_value(2, value.clone());
    let result1 = thread1.join().unwrap();
    let result2 = thread2.join().unwrap();
    println!("{:?}", SystemTime::now());
    println!("{} {}", result1, result2)
}

Why don’t we just use reference? Because it’s not all that easy. Rust doesn’t currently support scoped threads and thus thread::spawn() requres a 'static closure. That means we could only pass references with a 'static lifetime which may not always be what we want to do.

Shared mutable data

Technically, the borrow checker prevents you from sharing mutable data. But in the real world you’d like to distinguish sharable versus non-sharable rather than mutable versus immutable.

In other languages you often share data together with a mutex to resolve concurrent access. Therefore you want a Mutex<T> that is sharable (immutable in Rust) but that still provides access to a &mut T. This is called interior mutability. For the borrow checker the Mutex<T> looks immutable but a locked mutex enables mutable access to its inner value.

use std::{thread, time::Duration, time::SystemTime};
use std::sync::{Arc, Mutex};

fn delayed_value(sleep: u64, value: i32, results: Arc<Mutex<Vec<String>>>) -> thread::JoinHandle<()> {
    thread::spawn(move || {
        thread::sleep(Duration::from_secs(sleep));
        results.lock().unwrap().push(format!("Answer is {}.", value));
    })
}

fn main() {
    let results = Arc::new(Mutex::new(Vec::new()));
    println!("{:?}", SystemTime::now());
    let thread1 = delayed_value(3, 42, results.clone());
    let thread2 = delayed_value(2, 43, results.clone());
    thread1.join().unwrap();
    thread2.join().unwrap();
    println!("{:?}", SystemTime::now());
    println!("{:?}", results.lock().unwrap())
}

You might wonder why we need to use such a complex type as Arc<Mutex<Vec<String>>>. Well, the core value is a Vec<String> so that you can .push() new results to it. It is wrapped by a Mutex<_> so that you can .lock().unwrap() to get exclusive access to it. This .lock() method only fails when another thread dies while holding the mutex.

Why don’t we pass use the Mutex<_> directly? We could use &Mutex<_> to share the mutex across threads but then the threads aren’t scope and would require a &'static Mutex<_> reference. We already solved the same problem using Arc<_> for immutable data. As Mutex<_> is considered immutable, let’s just do the same for the mutex and pass data around in Arc<Mutex<_>>.

Rust’s mutex wrapper feature

Most programming languages treat mutex as a cooperative tool to synchronize critical sections that are then used to synchronize access to data. Rust is different. It wraps the data by the mutex so that it is only accessible in the critical section. This allows the borrow checker to statically check the correctness of the code.

As long as you wrap your data in Mutex<_> and Arc<_>, it is passed around via .clone() and data access guarded using .lock(). Only in a critical section the data is available.

// This block is your critical section
{
    let data = mutex.lock().unwrap();
    data.do_whatever_you_want();
    data.do_whatever_you_want();
    data.do_whatever_you_want();
}

Channels

Sharing data using a mutex isn’t the final answer to all your questions. Locking may have additional performance impact and you need to use additional tools like CondVar to introduce events.

If you just need to feed a thread with events or messages, a queue (or channel) based approach may serve you better.

use std::{thread, time::Duration, time::SystemTime};
use std::sync::mpsc::{SyncSender, sync_channel};

fn delayed_value(sleep: u64, value: i32, sender: SyncSender<String>) -> thread::JoinHandle<()> {
    thread::spawn(move || {
        thread::sleep(Duration::from_secs(sleep));
        sender.send(format!("Answer is {}.", value)).unwrap();
    })
}

fn main() {
    let (sender, receiver) = sync_channel(32);

    let thread1 = delayed_value(3, 42, sender.clone());
    let thread2 = delayed_value(2, 43, sender.clone());

    // Here in the main thread...
    println!("{:?}", SystemTime::now());
    println!("{}", receiver.recv().unwrap());
    println!("{:?}", SystemTime::now());
    println!("{}", receiver.recv().unwrap());
    println!("{:?}", SystemTime::now());

    thread1.join().unwrap();
    thread2.join().unwrap();
}