Re-do the 30 minute intro
This was originally on my blog, so it's incredibly informal. Let's make it better.
This commit is contained in:
parent
8cad720879
commit
6c5bf9f2a4
1 changed files with 500 additions and 359 deletions
859
src/doc/intro.md
859
src/doc/intro.md
|
@ -1,435 +1,576 @@
|
|||
% A 30-minute Introduction to Rust
|
||||
|
||||
Rust is a systems programming language that combines strong compile-time correctness guarantees with fast performance.
|
||||
It improves upon the ideas of other systems languages like C++
|
||||
by providing guaranteed memory safety (no crashes, no data races) and complete control over the lifecycle of memory.
|
||||
Strong memory guarantees make writing correct concurrent Rust code easier than in other languages.
|
||||
This introduction will give you an idea of what Rust is like in about thirty minutes.
|
||||
It expects that you're at least vaguely familiar with a previous 'curly brace' language,
|
||||
but does not require prior experience with systems programming.
|
||||
The concepts are more important than the syntax,
|
||||
so don't worry if you don't get every last detail:
|
||||
the [guide](guide.html) can help you out with that later.
|
||||
Rust is a modern systems programming language focusing on safety and speed. It
|
||||
accomplishes these goals by being memory safe without using garbage collection.
|
||||
|
||||
Let's talk about the most important concept in Rust, "ownership,"
|
||||
and its implications on a task that programmers usually find very difficult: concurrency.
|
||||
This introduction will give you a rough idea of what Rust is like, eliding many
|
||||
details. It does not require prior experience with systems programming, but you
|
||||
may find the syntax easier if you've used a 'curly brace' programming language
|
||||
before, like C or JavaScript. The concepts are more important than the syntax,
|
||||
so don't worry if you don't get every last detail: you can read [the
|
||||
Guide](guide.html) to get a more complete explanation.
|
||||
|
||||
# The power of ownership
|
||||
Because this is about high-level concepts, you don't need to actually install
|
||||
Rust to follow along. If you'd like to anyway, check out [the
|
||||
homepage](http://rust-lang.org) for explanation.
|
||||
|
||||
Ownership is central to Rust,
|
||||
and is the feature from which many of Rust's powerful capabilities are derived.
|
||||
"Ownership" refers to which parts of your code are allowed to read,
|
||||
write, and ultimately release, memory.
|
||||
Let's start by looking at some C++ code:
|
||||
To show off Rust, let's talk about how easy it is to get started with Rust.
|
||||
Then, we'll talk about Rust's most interesting feature, **ownership**, and
|
||||
then discuss how it makes concurrency easier to reason about. Finally,
|
||||
we'll talk about how Rust breaks down the perceived dichotomy between speed
|
||||
and safety.
|
||||
|
||||
```cpp
|
||||
int* dangling(void)
|
||||
{
|
||||
int i = 1234;
|
||||
return &i;
|
||||
}
|
||||
# Tools
|
||||
|
||||
int add_one(void)
|
||||
{
|
||||
int* num = dangling();
|
||||
return *num + 1;
|
||||
Getting started on a new Rust project is incredibly easy, thanks to Rust's
|
||||
package manager, [Cargo](http://crates.io).
|
||||
|
||||
To start a new project with Cargo, use `cargo new`:
|
||||
|
||||
```{bash}
|
||||
$ cargo new hello_world --bin
|
||||
```
|
||||
|
||||
We're passing `--bin` because we're making a binary program: if we
|
||||
were making a library, we'd leave it off.
|
||||
|
||||
Let's check out what Cargo has generated for us:
|
||||
|
||||
```{bash}
|
||||
$ cd hello_world
|
||||
$ tree .
|
||||
.
|
||||
├── Cargo.toml
|
||||
└── src
|
||||
└── main.rs
|
||||
|
||||
1 directory, 2 files
|
||||
```
|
||||
|
||||
This is all we need to get started. First, let's check out `Cargo.toml`:
|
||||
|
||||
```{toml}
|
||||
[package]
|
||||
|
||||
name = "hello_world"
|
||||
version = "0.0.1"
|
||||
authors = ["Your Name <you@example.com>"]
|
||||
```
|
||||
|
||||
This is called a **manifest**, and it contains all of the metadata that Cargo
|
||||
needs to compile your project.
|
||||
|
||||
Here's what's in `src/main.rs`:
|
||||
|
||||
```{rust}
|
||||
fn main() {
|
||||
println!("Hello, world!")
|
||||
}
|
||||
```
|
||||
|
||||
**Note: The above C++ code is deliberately simple and non-idiomatic for the purpose
|
||||
of demonstration. It is not representative of production-quality C++ code.**
|
||||
Cargo generated a 'hello world' for us. We'll talk more about the syntax here
|
||||
later, but that's what Rust code looks like! Let's compile and run it:
|
||||
|
||||
This function allocates an integer on the stack,
|
||||
and stores it in a variable, `i`.
|
||||
It then returns a reference to the variable `i`.
|
||||
There's just one problem:
|
||||
stack memory becomes invalid when the function returns.
|
||||
This means that in the second line of `add_one`,
|
||||
`num` points to some garbage values,
|
||||
and we won't get the effect that we want.
|
||||
While this is a trivial example,
|
||||
it can happen quite often in C++ code.
|
||||
There's a similar problem when memory on the heap is allocated with `malloc` (or `new`),
|
||||
then freed with `free` (or `delete`),
|
||||
yet your code attempts to do something with the pointer to that memory.
|
||||
This problem is called a 'dangling pointer,'
|
||||
and it's not possible to write Rust code that has it.
|
||||
Let's try writing it in Rust:
|
||||
```{bash}
|
||||
$ cargo run
|
||||
Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
|
||||
Running `target/hello_world`
|
||||
Hello, world!
|
||||
```
|
||||
|
||||
```ignore
|
||||
fn dangling() -> &int {
|
||||
let i = 1234;
|
||||
return &i;
|
||||
}
|
||||
Using an external dependency in Rust is incredibly easy. You add a line to
|
||||
your `Cargo.toml`:
|
||||
|
||||
fn add_one() -> int {
|
||||
let num = dangling();
|
||||
return *num + 1;
|
||||
}
|
||||
```{toml}
|
||||
[package]
|
||||
|
||||
name = "hello_world"
|
||||
version = "0.0.1"
|
||||
authors = ["Your Name <someone@example.com>"]
|
||||
|
||||
[dependencies.semver]
|
||||
|
||||
git = "https://github.com/rust-lang/semver.git"
|
||||
```
|
||||
|
||||
You added the `semver` library, which parses version numbers and compares them
|
||||
according to the [SemVer specification](http://semver.org/).
|
||||
|
||||
Now, you can pull in that library using `extern crate` in
|
||||
`main.rs`.
|
||||
|
||||
```{rust,ignore}
|
||||
extern crate semver;
|
||||
|
||||
use semver::Version;
|
||||
|
||||
fn main() {
|
||||
add_one();
|
||||
assert!(Version::parse("1.2.3") == Ok(Version {
|
||||
major: 1u,
|
||||
minor: 2u,
|
||||
patch: 3u,
|
||||
pre: vec!(),
|
||||
build: vec!(),
|
||||
}));
|
||||
|
||||
println!("Versions compared successfully!");
|
||||
}
|
||||
```
|
||||
|
||||
Save this program as `dangling.rs`. When you try to compile this program with `rustc dangling.rs`, you'll get an interesting (and long) error message:
|
||||
Again, we'll discuss the exact details of all of this syntax soon. For now,
|
||||
let's compile and run it:
|
||||
|
||||
```text
|
||||
dangling.rs:3:12: 3:14 error: `i` does not live long enough
|
||||
dangling.rs:3 return &i;
|
||||
^~
|
||||
dangling.rs:1:23: 4:2 note: reference must be valid for the anonymous lifetime #1 defined on the block at 1:22...
|
||||
dangling.rs:1 fn dangling() -> &int {
|
||||
dangling.rs:2 let i = 1234;
|
||||
dangling.rs:3 return &i;
|
||||
dangling.rs:4 }
|
||||
dangling.rs:1:23: 4:2 note: ...but borrowed value is only valid for the block at 1:22
|
||||
dangling.rs:1 fn dangling() -> &int {
|
||||
dangling.rs:2 let i = 1234;
|
||||
dangling.rs:3 return &i;
|
||||
dangling.rs:4 }
|
||||
```{bash}
|
||||
$ cargo run
|
||||
Updating git repository `https://github.com/rust-lang/semver.git`
|
||||
Compiling semver v0.0.1 (https://github.com/rust-lang/semver.git#bf739419)
|
||||
Compiling hello_world v0.0.1 (file:///home/you/projects/hello_world)
|
||||
Running `target/hello_world`
|
||||
Versions compared successfully!
|
||||
```
|
||||
|
||||
Because we only specified a repository without a version, if someone else were
|
||||
to try out our project at a later date, when `semver` was updated, they would
|
||||
get a different, possibly incompatible version. To solve this problem, Cargo
|
||||
produces a file, `Cargo.lock`, which records the versions of any dependencies.
|
||||
This gives us repeatable builds.
|
||||
|
||||
There is a lot more here, and this is a whirlwind tour, but you should feel
|
||||
right at home if you've used tools like [Bundler](http://bundler.io/),
|
||||
[npm](https://www.npmjs.org/), or [pip](https://pip.pypa.io/en/latest/).
|
||||
There's no `Makefile`s or endless `autotools` output here. (Rust's tooling does
|
||||
[play nice with external libraries written in those
|
||||
tools](http://crates.io/native-build.html), if you need to.)
|
||||
|
||||
Enough about tools, let's talk code!
|
||||
|
||||
# Ownership
|
||||
|
||||
Rust's defining feature is 'memory safety without garbage collection.' Let's
|
||||
take a moment to talk about what that means. **Memory safety** means that the
|
||||
programming language eliminates certain kinds of bugs, such as [buffer
|
||||
overflows](http://en.wikipedia.org/wiki/Buffer_overflow) and [dangling
|
||||
pointers](http://en.wikipedia.org/wiki/Dangling_pointer). These problems occur
|
||||
when you have unrestricted access to memory. As an example, here's some Ruby
|
||||
code:
|
||||
|
||||
```{ruby}
|
||||
v = [];
|
||||
|
||||
v.push("Hello");
|
||||
|
||||
x = v[0];
|
||||
|
||||
v.push("world");
|
||||
|
||||
puts x
|
||||
```
|
||||
|
||||
We make an array, `v`, and then call `push` on it. `push` is a method which
|
||||
adds an element to the end of an array.
|
||||
|
||||
Next, we make a new variable, `x`, that's equal to the first element of
|
||||
the array. Simple, but this is where the 'bug' will appear.
|
||||
|
||||
Let's keep going. We then call `push` again, pushing "world" onto the
|
||||
end of the array. `v` now is `["Hello", "world"]`.
|
||||
|
||||
Finally, we print `x` with the `puts` method. This prints "Hello."
|
||||
|
||||
All good? Let's go over a similar, but subtly different example, in C++:
|
||||
|
||||
```{cpp}
|
||||
#include<iostream>
|
||||
#include<vector>
|
||||
#include<string>
|
||||
|
||||
int main() {
|
||||
std::vector<std::string> v;
|
||||
|
||||
v.push_back("Hello");
|
||||
|
||||
std::string& x = v[0];
|
||||
|
||||
v.push_back("world");
|
||||
|
||||
std::cout << x;
|
||||
}
|
||||
```
|
||||
|
||||
It's a little more verbose due to the static typing, but it's almost the same
|
||||
thing. We make a `std::vector` of `std::string`s, we call `push_back` (same as
|
||||
`push`) on it, take a reference to the first element of the vector, call
|
||||
`push_back` again, and then print out the reference.
|
||||
|
||||
There's two big differences here: one, they're not _exactly_ the same thing,
|
||||
and two...
|
||||
|
||||
```{bash}
|
||||
$ g++ hello.cpp -Wall -Werror
|
||||
$ ./a.out
|
||||
Segmentation fault (core dumped)
|
||||
```
|
||||
|
||||
A crash! (Note that this is actually system-dependent. Because referring to an
|
||||
invalid reference is undefined behavior, the compiler can do anything,
|
||||
including the right thing!) Even though we compiled with flags to give us as
|
||||
many warnings as possible, and to treat those warnings as errors, we got no
|
||||
errors. When we ran the program, it crashed.
|
||||
|
||||
Why does this happen? When we prepend to an array, its length changes. Since
|
||||
its length changes, we may need to allocate more memory. In Ruby, this happens
|
||||
as well, we just don't think about it very often. So why does the C++ version
|
||||
segfault when we allocate more memory?
|
||||
|
||||
The answer is that in the C++ version, `x` is a **reference** to the memory
|
||||
location where the first element of the array is stored. But in Ruby, `x` is a
|
||||
standalone value, not connected to the underyling array at all. Let's dig into
|
||||
the details for a moment. Your program has access to memory, provided to it by
|
||||
the operating system. Each location in memory has an address. So when we make
|
||||
our vector, `v`, it's stored in a memory location somewhere:
|
||||
|
||||
| location | name | value |
|
||||
|----------|------|-------|
|
||||
| 0x30 | v | |
|
||||
|
||||
(Address numbers made up, and in hexadecimal. Those of you with deep C++
|
||||
knowledge, there are some simplifications going on here, like the lack of an
|
||||
allocated length for the vector. This is an introduction.)
|
||||
|
||||
When we push our first string onto the array, we allocate some memory,
|
||||
and `v` refers to it:
|
||||
|
||||
| location | name | value |
|
||||
|----------|------|----------|
|
||||
| 0x30 | v | 0x18 |
|
||||
| 0x18 | | "Hello" |
|
||||
|
||||
We then make a reference to that first element. A reference is a variable
|
||||
that points to a memory location, so its value is the memory location of
|
||||
the `"Hello"` string:
|
||||
|
||||
| location | name | value |
|
||||
|----------|------|----------|
|
||||
| 0x30 | v | 0x18 |
|
||||
| 0x18 | | "Hello" |
|
||||
| 0x14 | x | 0x18 |
|
||||
|
||||
When we push `"world"` onto the vector with `push_back`, there's no room:
|
||||
we only allocated one element. So, we need to allocate two elements,
|
||||
copy the `"Hello"` string over, and update the reference. Like this:
|
||||
|
||||
| location | name | value |
|
||||
|----------|------|----------|
|
||||
| 0x30 | v | 0x08 |
|
||||
| 0x18 | | GARBAGE |
|
||||
| 0x14 | x | 0x18 |
|
||||
| 0x08 | | "Hello" |
|
||||
| 0x04 | | "world" |
|
||||
|
||||
Note that `v` now refers to the new list, which has two elements. It's all
|
||||
good. But our `x` didn't get updated! It still points at the old location,
|
||||
which isn't valid anymore. In fact, [the documentation for `push_back` mentions
|
||||
this](http://en.cppreference.com/w/cpp/container/vector/push_back):
|
||||
|
||||
> If the new `size()` is greater than `capacity()` then all iterators and
|
||||
> references (including the past-the-end iterator) are invalidated.
|
||||
|
||||
Finding where these iterators and references are is a difficult problem, and
|
||||
even in this simple case, `g++` can't help us here. While the bug is obvious in
|
||||
this case, in real code, it can be difficult to track down the source of the
|
||||
error.
|
||||
|
||||
Before we talk about this solution, why didn't our Ruby code have this problem?
|
||||
The semantics are a little more complicated, and explaining Ruby's internals is
|
||||
out of the scope of a guide to Rust. But in a nutshell, Ruby's garbage
|
||||
collector keeps track of references, and makes sure that everything works as
|
||||
you might expect. This comes at an efficiency cost, and the internals are more
|
||||
complex. If you'd really like to dig into the details, [this
|
||||
article](http://patshaughnessy.net/2012/1/18/seeing-double-how-ruby-shares-string-values)
|
||||
can give you more information.
|
||||
|
||||
Garbage collection is a valid approach to memory safety, but Rust chooses a
|
||||
different path. Let's examine what the Rust version of this looks like:
|
||||
|
||||
```{rust,ignore}
|
||||
fn main() {
|
||||
let mut v = vec![];
|
||||
|
||||
v.push("Hello");
|
||||
|
||||
let x = &v[0];
|
||||
|
||||
v.push("world");
|
||||
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
This looks like a bit of both: fewer type annotations, but we do create new
|
||||
variables with `let`. The method name is `push`, some other stuff is different,
|
||||
but it's pretty close. So what happens when we compile this code? Does Rust
|
||||
print `"Hello"`, or does Rust crash?
|
||||
|
||||
Neither. It refuses to compile:
|
||||
|
||||
```{notrust,ignore}
|
||||
$ cargo run
|
||||
Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
|
||||
main.rs:8:5: 8:6 error: cannot borrow `v` as mutable because it is also borrowed as immutable
|
||||
main.rs:8 v.push("world");
|
||||
^
|
||||
main.rs:6:14: 6:15 note: previous borrow of `v` occurs here; the immutable borrow prevents subsequent moves or mutable borrows of `v` until the borrow ends
|
||||
main.rs:6 let x = &v[0];
|
||||
^
|
||||
main.rs:11:2: 11:2 note: previous borrow ends here
|
||||
main.rs:1 fn main() {
|
||||
...
|
||||
main.rs:11 }
|
||||
^
|
||||
error: aborting due to previous error
|
||||
```
|
||||
|
||||
In order to fully understand this error message,
|
||||
we need to talk about what it means to "own" something.
|
||||
So for now,
|
||||
let's just accept that Rust will not allow us to write code with a dangling pointer,
|
||||
and we'll come back to this code once we understand ownership.
|
||||
When we try to mutate the array by `push`ing it the second time, Rust throws
|
||||
an error. It says that we "cannot borrow v as mutable because it is also
|
||||
borrowed as immutable." What's up with "borrowed"?
|
||||
|
||||
Let's forget about programming for a second and talk about books.
|
||||
I like to read physical books,
|
||||
and sometimes I really like one and tell my friends they should read it.
|
||||
While I'm reading my book, I own it: the book is in my possession.
|
||||
When I loan the book out to someone else for a while, they "borrow" it from me.
|
||||
And when you borrow a book, it's yours for a certain period of time,
|
||||
and then you give it back to me, and I own it again. Right?
|
||||
In Rust, the type system encodes the notion of **ownership**. The variable `v`
|
||||
is an "owner" of the vector. When we make a reference to `v`, we let that
|
||||
variable (in this case, `x`) 'borrow' it for a while. Just like if you own a
|
||||
book, and you lend it to me, I'm borrowing the book.
|
||||
|
||||
This concept applies directly to Rust code as well:
|
||||
some code "owns" a particular pointer to memory.
|
||||
It's the sole owner of that pointer.
|
||||
It can also lend that memory out to some other code for a while:
|
||||
that code "borrows" the memory,
|
||||
and it borrows it for a precise period of time,
|
||||
called a "lifetime."
|
||||
So, when I try to modify the vector with the second call to `push`, I need
|
||||
to be owning it. But `x` is borrowing it. You can't modify something that
|
||||
you've lent to someone. And so Rust throws an error.
|
||||
|
||||
That's all there is to it.
|
||||
That doesn't seem so hard, right?
|
||||
Let's go back to that error message:
|
||||
`error: 'i' does not live long enough`.
|
||||
We tried to loan out a particular variable, `i`,
|
||||
using a reference (the `&` operator) but Rust knew that the variable would be invalid after the function returns,
|
||||
and so it tells us that:
|
||||
`reference must be valid for the anonymous lifetime #1...`.
|
||||
Neat!
|
||||
So how do we fix this problem? Well, we can make a copy of the element:
|
||||
|
||||
That's a great example for stack memory,
|
||||
but what about heap memory?
|
||||
Rust has a second kind of pointer,
|
||||
an 'owned box',
|
||||
that you can create with the `box` operator.
|
||||
Check it out:
|
||||
|
||||
```
|
||||
|
||||
fn dangling() -> Box<int> {
|
||||
let i = box 1234i;
|
||||
return i;
|
||||
}
|
||||
|
||||
fn add_one() -> int {
|
||||
let num = dangling();
|
||||
return *num + 1;
|
||||
}
|
||||
```
|
||||
|
||||
Now instead of a stack allocated `1234i`,
|
||||
we have a heap allocated `box 1234i`.
|
||||
Whereas `&` borrows a pointer to existing memory,
|
||||
creating an owned box allocates memory on the heap and places a value in it,
|
||||
giving you the sole pointer to that memory.
|
||||
You can roughly compare these two lines:
|
||||
|
||||
```
|
||||
// Rust
|
||||
let i = box 1234i;
|
||||
```
|
||||
|
||||
```cpp
|
||||
// C++
|
||||
int *i = new int;
|
||||
*i = 1234;
|
||||
```
|
||||
|
||||
Rust infers the correct type,
|
||||
allocates the correct amount of memory and sets it to the value you asked for.
|
||||
This means that it's impossible to allocate uninitialized memory:
|
||||
*Rust does not have the concept of null*.
|
||||
Hooray!
|
||||
There's one other difference between this line of Rust and the C++:
|
||||
The Rust compiler also figures out the lifetime of `i`,
|
||||
and then inserts a corresponding `free` call after it's invalid,
|
||||
like a destructor in C++.
|
||||
You get all of the benefits of manually allocated heap memory without having to do all the bookkeeping yourself.
|
||||
Furthermore, all of this checking is done at compile time,
|
||||
so there's no runtime overhead.
|
||||
You'll get (basically) the exact same code that you'd get if you wrote the correct C++,
|
||||
but it's impossible to write the incorrect version, thanks to the compiler.
|
||||
|
||||
You've seen one way that ownership and borrowing are useful to prevent code that would normally be dangerous in a less-strict language,
|
||||
but let's talk about another: concurrency.
|
||||
|
||||
# Owning concurrency
|
||||
|
||||
Concurrency is an incredibly hot topic in the software world right now.
|
||||
It's always been an interesting area of study for computer scientists,
|
||||
but as usage of the Internet explodes,
|
||||
people are looking to improve the number of users a given service can handle.
|
||||
Concurrency is one way of achieving this goal.
|
||||
There is a pretty big drawback to concurrent code, though:
|
||||
it can be hard to reason about, because it is non-deterministic.
|
||||
There are a few different approaches to writing good concurrent code,
|
||||
but let's talk about how Rust's notions of ownership and lifetimes contribute to correct but concurrent code.
|
||||
|
||||
First, let's go over a simple concurrency example.
|
||||
Rust makes it easy to create "tasks",
|
||||
otherwise known as "threads".
|
||||
Typically, tasks do not share memory but instead communicate amongst each other with 'channels', like this:
|
||||
|
||||
```
|
||||
```{rust}
|
||||
fn main() {
|
||||
let numbers = vec![1i, 2i, 3i];
|
||||
let mut v = vec![];
|
||||
|
||||
let (tx, rx) = channel();
|
||||
tx.send(numbers);
|
||||
v.push("Hello");
|
||||
|
||||
spawn(proc() {
|
||||
let numbers = rx.recv();
|
||||
println!("{}", numbers[0]);
|
||||
})
|
||||
let x = v[0].clone();
|
||||
|
||||
v.push("world");
|
||||
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
In this example, we create a boxed array of numbers.
|
||||
We then make a 'channel',
|
||||
Rust's primary means of passing messages between tasks.
|
||||
The `channel` function returns two different ends of the channel:
|
||||
a `Sender` and `Receiver` (commonly abbreviated `tx` and `rx`).
|
||||
The `spawn` function spins up a new task,
|
||||
given a *heap allocated closure* to run.
|
||||
As you can see in the code,
|
||||
we call `tx.send()` from the original task,
|
||||
passing in our boxed array,
|
||||
and we call `rx.recv()` (short for 'receive') inside of the new task:
|
||||
values given to the `Sender` via the `send` method come out the other end via the `recv` method on the `Receiver`.
|
||||
Note the addition of `clone()`. This creates a copy of the element, leaving
|
||||
the original untouched. Now, we no longer have two references to the same
|
||||
memory, and so the compiler is happy. Let's give that a try:
|
||||
|
||||
Now here's the exciting part:
|
||||
because `numbers` is an owned type,
|
||||
when it is sent across the channel,
|
||||
it is actually *moved*,
|
||||
transferring ownership of `numbers` between tasks.
|
||||
This ownership transfer is *very fast* -
|
||||
in this case simply copying a pointer -
|
||||
while also ensuring that the original owning task cannot create data races by continuing to read or write to `numbers` in parallel with the new owner.
|
||||
```{bash}
|
||||
$ cargo run
|
||||
Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
|
||||
Running `target/hello_world`
|
||||
Hello
|
||||
```
|
||||
|
||||
To prove that Rust performs the ownership transfer,
|
||||
try to modify the previous example to continue using the variable `numbers`:
|
||||
Same result. Now, making a copy can be inefficient, so this solution may not be
|
||||
acceptable. There are other ways to get around this problem, but this is a toy
|
||||
example, and because we're in an introduction, we'll leave that for later.
|
||||
|
||||
```ignore
|
||||
The point is, the Rust compiler and its notion of ownership has saved us from a
|
||||
bug that would crash the program. We've achieved safety, at compile time,
|
||||
without needing to rely on a garbage collector to handle our memory.
|
||||
|
||||
# Concurrency
|
||||
|
||||
Rust's ownership model can help in other ways, as well. For example, take
|
||||
concurrency. Concurrency is a big topic, and an important one for any modern
|
||||
programming language. Let's take a look at how ownership can help you write
|
||||
safe concurrent programs.
|
||||
|
||||
Here's an example of a concurrent Rust program:
|
||||
|
||||
```{rust}
|
||||
fn main() {
|
||||
let numbers = vec![1i, 2i, 3i];
|
||||
for _ in range(0u, 10u) {
|
||||
spawn(proc() {
|
||||
println!("Hello, world!");
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
let (tx, rx) = channel();
|
||||
tx.send(numbers);
|
||||
This program creates ten threads, who all print `Hello, world!`. The `spawn`
|
||||
function takes one argument, a `proc`. 'proc' is short for 'procedure,' and is
|
||||
a form of closure. This closure is executed in a new thread, created by `spawn`
|
||||
itself.
|
||||
|
||||
One common form of problem in concurrent programs is a 'data race.' This occurs
|
||||
when two different threads attempt to access the same location in memory in a
|
||||
non-synchronized way, where at least one of them is a write. If one thread is
|
||||
attempting to read, and one thread is attempting to write, you cannot be sure
|
||||
that your data will not be corrupted. Note the first half of that requirement:
|
||||
two threads that attempt to access the same location in memory. Rust's
|
||||
ownership model can track which pointers own which memory locations, which
|
||||
solves this problem.
|
||||
|
||||
Let's see an example. This Rust code will not compile:
|
||||
|
||||
```{rust,ignore}
|
||||
fn main() {
|
||||
let mut numbers = vec![1i, 2i, 3i];
|
||||
|
||||
for i in range(0u, 3u) {
|
||||
spawn(proc() {
|
||||
for j in range(0, 3) { numbers[j] += 1 }
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
It gives us this error:
|
||||
|
||||
```{notrust,ignore}
|
||||
6:71 error: capture of moved value: `numbers`
|
||||
for j in range(0, 3) { numbers[j] += 1 }
|
||||
^~~~~~~
|
||||
7:50 note: `numbers` moved into closure environment here because it has type `proc():Send`, which is non-copyable (perhaps you meant to use clone()?)
|
||||
spawn(proc() {
|
||||
let numbers = rx.recv();
|
||||
println!("{}", numbers[0]);
|
||||
for j in range(0, 3) { numbers[j] += 1 }
|
||||
});
|
||||
|
||||
// Try to print a number from the original task
|
||||
println!("{}", numbers[0]);
|
||||
}
|
||||
6:79 error: cannot assign to immutable dereference (dereference is implicit, due to indexing)
|
||||
for j in range(0, 3) { numbers[j] += 1 }
|
||||
^~~~~~~~~~~~~~~
|
||||
```
|
||||
|
||||
The compiler will produce an error indicating that the value is no longer in scope:
|
||||
It mentions that "numbers moved into closure environment". Because we referred
|
||||
to `numbers` inside of our `proc`, and we create ten `proc`s, we would have ten
|
||||
references. Rust detects this and gives us the error: we claim that `numbers`
|
||||
has ownership, but our code tries to make ten owners. This may cause a safety
|
||||
problem, so Rust disallows it.
|
||||
|
||||
```text
|
||||
concurrency.rs:12:20: 12:27 error: use of moved value: 'numbers'
|
||||
concurrency.rs:12 println!("{}", numbers[0]);
|
||||
^~~~~~~
|
||||
```
|
||||
What to do here? Rust has two types that helps us: `Arc<T>` and `Mutex<T>`.
|
||||
"Arc" stands for "atomically reference counted." In other words, an Arc will
|
||||
keep track of the number of references to something, and not free the
|
||||
associated resource until the count is zero. The 'atomic' portion refers to an
|
||||
Arc's usage of concurrency primitives to atomically update the count, making it
|
||||
safe across threads. If we use an Arc, we can have our ten references. But, an
|
||||
Arc does not allow mutable borrows of the data it holds, and we want to modify
|
||||
what we're sharing. In this case, we can use a `Mutex<T>` inside of our Arc. A
|
||||
Mutex will synchronize our accesses, so that we can ensure that our mutation
|
||||
doesn't cause a data race.
|
||||
|
||||
Since only one task can own a boxed array at a time,
|
||||
if instead of distributing our `numbers` array to a single task we wanted to distribute it to many tasks,
|
||||
we would need to copy the array for each.
|
||||
Let's see an example that uses the `clone` method to create copies of the data:
|
||||
Here's what using an Arc with a Mutex looks like:
|
||||
|
||||
```{rust}
|
||||
use std::sync::{Arc,Mutex};
|
||||
|
||||
```
|
||||
fn main() {
|
||||
let numbers = vec![1i, 2i, 3i];
|
||||
|
||||
for num in range(0u, 3) {
|
||||
let (tx, rx) = channel();
|
||||
// Use `clone` to send a *copy* of the array
|
||||
tx.send(numbers.clone());
|
||||
let numbers = Arc::new(Mutex::new(vec![1i, 2i, 3i]));
|
||||
|
||||
for i in range(0u, 3u) {
|
||||
let number = numbers.clone();
|
||||
spawn(proc() {
|
||||
let numbers = rx.recv();
|
||||
println!("{:d}", numbers[num]);
|
||||
})
|
||||
let mut array = number.lock();
|
||||
|
||||
(*(*array).get_mut(i)) += 1;
|
||||
|
||||
println!("numbers[{}] is {}", i, (*array)[i]);
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This is similar to the code we had before,
|
||||
except now we loop three times,
|
||||
making three tasks,
|
||||
and *cloning* `numbers` before sending it.
|
||||
We first have to `use` the appropriate library, and then we wrap our vector in
|
||||
an Arc with the call to `Arc::new()`. Inside of the loop, we make a new
|
||||
reference to the Arc with the `clone()` method. This will increment the
|
||||
reference count. When each new `numbers` variable binding goes out of scope, it
|
||||
will decrement the count. The `lock()` call will return us a reference to the
|
||||
value inside the Mutex, and block any other calls to `lock()` until said
|
||||
reference goes out of scope.
|
||||
|
||||
However, if we're making a lot of tasks,
|
||||
or if our data is very large,
|
||||
creating a copy for each task requires a lot of work and a lot of extra memory for little benefit.
|
||||
In practice, we might not want to do this because of the cost.
|
||||
Enter `Arc`,
|
||||
an atomically reference counted box ("A.R.C." == "atomically reference counted").
|
||||
`Arc` is the most common way to *share* data between tasks.
|
||||
Here's some code:
|
||||
We can compile and run this program without error, and in fact, see the
|
||||
non-deterministic aspect:
|
||||
|
||||
```{shell}
|
||||
$ cargo run
|
||||
Compiling hello_world v0.0.1 (file:///Users/you/src/hello_world)
|
||||
Running `target/hello_world`
|
||||
numbers[1] is 2
|
||||
numbers[0] is 1
|
||||
numbers[2] is 3
|
||||
$ cargo run
|
||||
Running `target/hello_world`
|
||||
numbers[2] is 3
|
||||
numbers[1] is 2
|
||||
numbers[0] is 1
|
||||
```
|
||||
use std::sync::Arc;
|
||||
|
||||
Each time, we get a slightly different output, because each thread works in a
|
||||
different order. You may not get the same output as this sample, even.
|
||||
|
||||
The important part here is that the Rust compiler was able to use ownership to
|
||||
give us assurance _at compile time_ that we weren't doing something incorrect
|
||||
with regards to concurrency. In order to share ownership, we were forced to be
|
||||
explicit and use a mechanism to ensure that it would be properly handled.
|
||||
|
||||
# Safety _and_ speed
|
||||
|
||||
Safety and speed are always presented as a continuum. On one hand, you have
|
||||
maximum speed, but no safety. On the other, you have absolute safety, with no
|
||||
speed. Rust seeks to break out of this mode by introducing safety at compile
|
||||
time, ensuring that you haven't done anything wrong, while compiling to the
|
||||
same low-level code you'd expect without the safety.
|
||||
|
||||
As an example, Rust's ownership system is _entirely_ at compile time. The
|
||||
safety check that makes this an error about moved values:
|
||||
|
||||
```{rust,ignore}
|
||||
fn main() {
|
||||
let numbers = Arc::new(vec![1i, 2i, 3i]);
|
||||
|
||||
for num in range(0u, 3) {
|
||||
let (tx, rx) = channel();
|
||||
tx.send(numbers.clone());
|
||||
let vec = vec![1i, 2, 3];
|
||||
|
||||
for i in range(1u, 3) {
|
||||
spawn(proc() {
|
||||
let numbers = rx.recv();
|
||||
println!("{:d}", (*numbers)[num as uint]);
|
||||
})
|
||||
println!("{}", vec[i]);
|
||||
});
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This is almost exactly the same,
|
||||
except that this time `numbers` is first put into an `Arc`.
|
||||
`Arc::new` creates the `Arc`,
|
||||
`.clone()` makes another `Arc` that refers to the same contents.
|
||||
So we clone the `Arc` for each task,
|
||||
send that clone down the channel,
|
||||
and then use it to print out a number.
|
||||
Now instead of copying an entire array to send it to our multiple tasks we are just copying a pointer (the `Arc`) and *sharing* the array.
|
||||
carries no runtime penalty. And while some of Rust's safety features do have
|
||||
a run-time cost, there's often a way to write your code in such a way that
|
||||
you can remove it. As an example, this is a poor way to iterate through
|
||||
a vector:
|
||||
|
||||
How can this work though?
|
||||
Surely if we're sharing data then can't we cause data races if one task writes to the array while others read?
|
||||
```{rust}
|
||||
let vec = vec![1i, 2, 3];
|
||||
|
||||
Well, Rust is super-smart and will only let you put data into an `Arc` that is provably safe to share.
|
||||
In this case, it's safe to share the array *as long as it's immutable*,
|
||||
i.e. many tasks may read the data in parallel as long as none can write.
|
||||
So for this type and many others `Arc` will only give you an immutable view of the data.
|
||||
|
||||
Arcs are great for immutable data,
|
||||
but what about mutable data?
|
||||
Shared mutable state is the bane of the concurrent programmer:
|
||||
you can use a mutex to protect shared mutable state,
|
||||
but if you forget to acquire the mutex, bad things can happen, including crashes.
|
||||
Rust provides mutexes but makes it impossible to use them in a way that subverts memory safety.
|
||||
|
||||
Let's take the same example yet again,
|
||||
and modify it to mutate the shared state:
|
||||
|
||||
```
|
||||
use std::sync::{Arc, Mutex};
|
||||
|
||||
fn main() {
|
||||
let numbers_lock = Arc::new(Mutex::new(vec![1i, 2i, 3i]));
|
||||
|
||||
for num in range(0u, 3) {
|
||||
let (tx, rx) = channel();
|
||||
tx.send(numbers_lock.clone());
|
||||
|
||||
spawn(proc() {
|
||||
let numbers_lock = rx.recv();
|
||||
|
||||
// Take the lock, along with exclusive access to the underlying array
|
||||
let mut numbers = numbers_lock.lock();
|
||||
|
||||
// This is ugly for now because of the need for `get_mut`, but
|
||||
// will be replaced by `numbers[num as uint] += 1`
|
||||
// in the near future.
|
||||
// See: https://github.com/rust-lang/rust/issues/6515
|
||||
*numbers.get_mut(num as uint) += 1;
|
||||
|
||||
println!("{}", (*numbers)[num as uint]);
|
||||
|
||||
// When `numbers` goes out of scope the lock is dropped
|
||||
})
|
||||
}
|
||||
for i in range(1u, vec.len()) {
|
||||
println!("{}", vec[i]);
|
||||
}
|
||||
```
|
||||
|
||||
This example is starting to get more subtle,
|
||||
but it hints at the powerful composability of Rust's concurrent types.
|
||||
This time we've put our array of numbers inside a `Mutex` and then put *that* inside the `Arc`.
|
||||
Like immutable data,
|
||||
`Mutex`es are sharable,
|
||||
but unlike immutable data,
|
||||
data inside a `Mutex` may be mutated as long as the mutex is locked.
|
||||
The reason is that the access of `vec[i]` does bounds checking, to ensure
|
||||
that we don't try to access an invalid index. However, we can remove this
|
||||
while retaining safety. The answer is iterators:
|
||||
|
||||
The `lock` method here returns not your original array or a pointer thereof,
|
||||
but a `MutexGuard`,
|
||||
a type that is responsible for releasing the lock when it goes out of scope.
|
||||
This same `MutexGuard` can transparently be treated as if it were the value the `Mutex` contains,
|
||||
as you can see in the subsequent indexing operation that performs the mutation.
|
||||
```{rust}
|
||||
let vec = vec![1i, 2, 3];
|
||||
|
||||
OK, let's stop there before we get too deep.
|
||||
for x in vec.iter() {
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
# A footnote: unsafe
|
||||
This version uses an iterator that yields each element of the vector in turn.
|
||||
Because we have a reference to the element, rather than the whole vector itself,
|
||||
there's no array access bounds to check.
|
||||
|
||||
The Rust compiler and libraries are entirely written in Rust;
|
||||
we say that Rust is "self-hosting".
|
||||
If Rust makes it impossible to unsafely share data between threads,
|
||||
and Rust is written in Rust,
|
||||
then how does it implement concurrent types like `Arc` and `Mutex`?
|
||||
The answer: `unsafe`.
|
||||
# Learning More
|
||||
|
||||
You see, while the Rust compiler is very smart,
|
||||
and saves you from making mistakes you might normally make,
|
||||
it's not an artificial intelligence.
|
||||
Because we're smarter than the compiler -
|
||||
sometimes - we need to over-ride this safe behavior.
|
||||
For this purpose, Rust has an `unsafe` keyword.
|
||||
Within an `unsafe` block,
|
||||
Rust turns off many of its safety checks.
|
||||
If something bad happens to your program,
|
||||
you only have to audit what you've done inside `unsafe`,
|
||||
and not the entire program itself.
|
||||
I hope that this taste of Rust has given you an idea if Rust is the right
|
||||
language for you. We talked about Rust's tooling, how encoding ownership into
|
||||
the type system helps you find bugs, how Rust can help you write correct
|
||||
concurrent code, and how you don't have to pay a speed cost for much of this
|
||||
safety.
|
||||
|
||||
If one of the major goals of Rust was safety,
|
||||
why allow that safety to be turned off?
|
||||
Well, there are really only three main reasons to do it:
|
||||
interfacing with external code,
|
||||
such as doing FFI into a C library;
|
||||
performance (in certain cases);
|
||||
and to provide a safe abstraction around operations that normally would not be safe.
|
||||
Our `Arc`s are an example of this last purpose.
|
||||
We can safely hand out multiple pointers to the contents of the `Arc`,
|
||||
because we are sure the data is safe to share.
|
||||
But the Rust compiler can't know that we've made these choices,
|
||||
so _inside_ the implementation of the Arcs,
|
||||
we use `unsafe` blocks to do (normally) dangerous things.
|
||||
But we expose a safe interface,
|
||||
which means that the `Arc`s are impossible to use incorrectly.
|
||||
|
||||
This is how Rust's type system prevents you from making some of the mistakes that make concurrent programming difficult,
|
||||
yet get the efficiency of languages such as C++.
|
||||
|
||||
# That's all, folks
|
||||
|
||||
I hope that this taste of Rust has given you an idea if Rust is the right language for you.
|
||||
If that's true,
|
||||
I encourage you to check out [the guide](guide.html) for a full,
|
||||
To continue your Rustic education, read [the guide](guide.html) for a more
|
||||
in-depth exploration of Rust's syntax and concepts.
|
||||
|
|
Loading…
Reference in a new issue