Rust’s safety model is built for parallel programming. The language works great with operating system threads. But do you really want to explicitly maintain a set of threads?
There are a few reasons you might opt for concurrency:
-
You want to speed up your computations by running code on multiple CPU cores.
-
Your application needs to handle I/O events from multiple different sources.
-
More subtle performance considerations that we’re not going to discuss.
Parallel computation
Parallel computation is best handled by set of communicating pieces of code running in parallel. These are the most common solutions.
-
Multiple running programs talking to each other via inter-proccess communication mechanisms provided by the operating system.
-
A multiprocessing scenario where a single program that spawns a few copies of itself that then use inter-process communication.
-
A multithreading scenario where lightweight copies are created that share the same memory space and other resources.
Multiprocessing is popular in languages like C (or maybe even C++) where maintaining correct operation may prove difficult. Some people might use it to overcome the lack of threading support in the Python interpreter.
Rust provides thread safety in form of core language features. In all code that is not marked unsafe, correct access to program data is required and checked by the compiler. More fine-grained synchronization tools are provided by the libraries. Multithreading is therefore a natural choice when working with Rust.
A naive implementation starts new threads or processes whenever they are needed. High performance application tend to avoid operating system overhead by creating a fixed number of threads or processes in advance.
Rust supports coroutines (or asynchronous functions) that can be safely executed in different threads. The standard library doesn’t provide a coroutine scheduler but the famous Tokio library does exactly that.
Futures and coroutines
Writing all parallel code in these coroutines and letting Tokio scheduler do the planing is by far the easiest way to perform parallel computation. But that’s not the whole story to be told.
Applications usually spend much more time waiting for I/O than performing heavy computations. In general those two cases can be split and handled separately but asynchronous functions and the Tokio library can be used to solve both cases. We will focus on the I/O case.
For comparison, an application that uses blocking calls would ask the operating system for new data and sleep until the data is available. A non-blocking application would usually wait for data from multiple sources and only sleep when there’s nothing to do.
Futures are essentially results that may not be available yet. An example of a future is the contents of a website we haven’t downloaded yet. An asynchronous function doesn’t run its code when called. Instead it returns a future that will run the code on-demand.
Simple I/O example
First let’s set up Cargo.toml
. Full feature set will give you all the I/O
tools and macros as well. If you forget it, #[tokio::main]
won’t work.
[package]
name = "example"
version = "0.1.0"
edition = "2021"
[dependencies]
tokio = { version = "1", features = ["full"] }
Then let’s put some example code to src/main.rs
. Let’s simulate a
simplified HTTP client communication. We will explicitly use a
single-threaded Tokio runtime.
use tokio::net::TcpStream;
use tokio::io::{AsyncReadExt, AsyncWriteExt};
#[tokio::main(flavor = "current_thread")]
async fn main() {
let target = ("example.net", 80);
let mut stream = TcpStream::connect(target)
.await
.expect("Connection failed.");
stream.write_all(b"GET / HTTP/1.0\r\n\r\n")
.await
.expect("Write failed.");
let mut content = Vec::new();
stream.read_to_end(&mut content)
.await
.expect("Read failed.");
let text = String::from_utf8(content).expect("UTF-8 conversion failed.");
println!("{:?}", text);
}
This isn’t all that different from code using with connect()
, write()
and read()
system calls? You can see three await
points that mark where
the code might wait for events.
However, that you see where waiting points in the function are already a huge
difference. Another difference is that each of the await
points waits for
a future or coroutine provided by Tokio. Please note that the I/O layer and
the scheduler depend on each other. You can only use I/O tools compatible
with Tokio in a Tokio based application.
Just like the scheduler is hidden from your eyes, so is the event waiting mechanism. Whenever you wait for an I/O future, there is an event source added to Tokio that would later deliver an event and resume execution of the respective coroutine code. This provides the necessary blocking and resuming framework to your coroutines.
Concurrent and parallel execution
Tokio strictly distinguishes concurrency and parallelism. You can run
multiple concurrent functions in a single thread using tokio::join!
macro but only tokio::spawn()
function provides parallel execution.
use tokio::io::{AsyncReadExt, AsyncWriteExt};
use tokio::net::TcpStream;
async fn download(host: &str) -> Result<String, std::io::Error> {
let target = (host, 80);
let mut stream = TcpStream::connect(target).await?;
stream.write_all(b"GET / HTTP/1.0\r\n\r\n").await?;
let mut content = Vec::new();
stream.read_to_end(&mut content).await?;
Ok(String::from_utf8(content).expect("UTF-8 conversion failed."))
}
#[tokio::main]
async fn main() {
let download1 = tokio::spawn(download("example.com"));
let download2 = tokio::spawn(download("example.net"));
let result1 = download1
.await
.expect("First download crashed.")
.expect("First download failed.");
let result2 = download2
.await
.expect("Second download crashed.")
.expect("Second download failed.");
println!("{:?}, {:?}", result1, result2);
}
As you can see the usage of asynchronous functions as Tokio tasks closely resembles how threads are used in general. Tasks are created and joined just like threads, communicate just like threads and are distributed by Tokio into actual operating system threads. In the general case working with Tokio tasks in I/O applications is easier and more convenient.
The borrow checker
Your best friend and worst enemy is the borrow checker. Whether you’re
writing client or server code, you often need to communicate in both
directions simultaneously. You might want to enclose TcpStream
in
BufReader
for reading but still keep it around for writing. This is not
possible with a single stream object.
Tokio, just like the standard library, provides an option to .split()
or
convert .into_split()
the TcpStream
and get separate reader and writer
objects. The reader can then be buffered and read line by line using
BufRead
while you can .write_all()
your data to the writer. Using
.into_split()
rather than .split()
creates two linked owned objects
that are completely independent for the static borrow checker.
use tokio::io::{BufReader, AsyncBufReadExt, AsyncWriteExt};
use tokio::net::{TcpListener, TcpStream};
async fn communicate(mut stream: TcpStream) -> Result<(), std::io::Error> {
let (rx, mut tx) = stream.split();
let mut lines = BufReader::new(rx).lines();
tx.write_all(b"HTTP/1.0 200 OK\r\nContent-Type: text/plain\r\n\r\n").await?;
while let Some(line) = lines.next_line().await? {
if line.len() == 0 {
break;
}
tx.write_all(format!("{}\n", line).as_bytes()).await?;
}
Ok(())
}
async fn serve_forever() -> Result<(), std::io::Error>{
let listener = TcpListener::bind(("localhost", 8080)).await?;
loop {
let (stream, _addr) = listener.accept().await?;
communicate(stream).await.ok();
}
}
#[tokio::main]
async fn main() {
serve_forever().await.unwrap();
}
Note that the above code is single-threaded in Tokio even though it is run
using multi-threaded Tokio runner. That is usually a mistake. What you need
is to run communicate()
as a separate task just like download()
in the
previous example. Then each of the clients gets its own task (effectively
an application-level thread) that can be scheduled using the pool of
operating system threads managed by Tokio.
This is how you create a threaded server with a fixed-sized thread
pool with just a bunch of asynchronous functions or methods. Jou can pass
any data, owned or borrowed, into your asynchronous tasks as long as you
understand that they are held by a future object that exists since it’s
created by calling the coroutine function until it is .await
-ed. Tasks
are just packaged separately scheduled futures.
You shouldn’t create structures so complex that you cannot make them work
with the borrow checker. If the object interdependence and structure
becomes to complex, you can always split it into multiple tasks that
communicate via tokio::sync::mpsc
. It is very often a better choice than
holding data in a common tokio::sync::Mutex
or std::sync::Mutex
.
Notes
Bring your own questions to the next lessons as usual. Homeworks will start appearing as soon as I get familiar with ReCodEx. We’re going to use it for this semester.