# Datatypes Rust datatypes are, by default, immutable. The core datatypes of Rust are structural records and 'tags' (tagged unions, algebraic data types). type point = {x: float, y: float}; tag shape { circle(point, float); rectangle(point, point); } let my_shape = circle({x: 0.0, y: 0.0}, 10.0); ## Records Rust record types are written `{field1: TYPE, field2: TYPE [, ...]}`, and record literals are written in the same way, but with expressions instead of types. They are quite similar to C structs, and even laid out the same way in memory (so you can read from a Rust struct in C, and vice-versa). The dot operator is used to access record fields (`mypoint.x`). Fields that you want to mutate must be explicitly marked as such. For example... type stack = {content: [int], mutable head: uint}; With such a type, you can do `mystack.head += 1u`. When the `mutable` is omitted from the type, such an assignment would result in a type error. To 'update' an immutable record, you use functional record update syntax, by ending a record literal with the keyword `with`: let oldpoint = {x: 10f, y: 20f}; let newpoint = {x: 0f with oldpoint}; assert newpoint == {x: 0f, y: 20f}; This will create a new struct, copying all the fields from `oldpoint` into it, except for the ones that are explicitly set in the literal. Rust record types are *structural*. This means that `{x: float, y: float}` is not just a way to define a new type, but is the actual name of the type. Record types can be used without first defining them. If module A defines `type point = {x: float, y: float}`, and module B, without knowing anything about A, defines a function that returns an `{x: float, y: float}`, you can use that return value as a `point` in module A. (Remember that `type` defines an additional name for a type, not an actual new type.) ## Record patterns Records can be destructured on in `alt` patterns. The basic syntax is `{fieldname: pattern, ...}`, but the pattern for a field can be omitted as a shorthand for simply binding the variable with the same name as the field. alt mypoint { {x: 0f, y: y_name} { /* Provide sub-patterns for fields */ } {x, y} { /* Simply bind the fields */ } } When you are not interested in all the fields of a record, a record pattern may end with `, _` (as in `{field1, _}`) to indicate that you're ignoring all other fields. ## Tags Tags [FIXME terminology] are datatypes that have several different representations. For example, the type shown earlier: tag shape { circle(point, float); rectangle(point, point); } A value of this type is either a circle¸ in which case it contains a point record and a float, or a rectangle, in which case it contains two point records. The run-time representation of such a value includes an identifier of the actual form that it holds, much like the 'tagged union' pattern in C, but with better ergonomics. The above declaration will define a type `shape` that can be used to refer to such shapes, and two functions, `circle` and `rectangle`, which can be used to construct values of the type (taking arguments of the specified types). So `circle({x: 0f, y: 0f}, 10f)` is the way to create a new circle. Tag variants do not have to have parameters. This, for example, is equivalent to an `enum` in C: tag direction { north; east; south; west; }; This will define `north`, `east`, `south`, and `west` as constants, all of which have type `direction`. There is a special case for tags with a single variant. These are used to define new types in such a way that the new name is not just a synonym for an existing type, but its own distinct type. If you say: tag gizmo_id = int; That is a shorthand for this: tag gizmo_id { gizmo_id(int); } Tag types like this can have their content extracted with the dereference (`*`) unary operator: let my_gizmo_id = gizmo_id(10); let id_int: int = *my_gizmo_id; ## Tag patterns For tag types with multiple variants, destructuring is the only way to get at their contents. All variant constructors can be used as patterns, as in this definition of `area`: fn area(sh: shape) -> float { alt sh { circle(_, size) { std::math::pi * size * size } rectangle({x, y}, {x: x2, y: y2}) { (x2 - x) * (y2 - y) } } } For variants without arguments, you have to write `variantname.` (with a dot at the end) to match them in a pattern. This to prevent ambiguity between matching a variant name and binding a new variable. fn point_from_direction(dir: direction) -> point { alt dir { north. { {x: 0f, y: 1f} } east. { {x: 1f, y: 0f} } south. { {x: 0f, y: -1f} } west. { {x: -1f, y: 0f} } } } ## Tuples Tuples in Rust behave exactly like records, except that their fields do not have names (and can thus not be accessed with dot notation). Tuples can have any arity except for 0 or 1 (though you may see nil, `()`, as the empty tuple if you like). let mytup: (int, int, float) = (10, 20, 30.0); alt mytup { (a, b, c) { log a + b + (c as int); } } ## Pointers In contrast to a lot of modern languages, record and tag types in Rust are not represented as pointers to allocated memory. They are, like in C and C++, represented directly. This means that if you `let x = {x: 1f, y: 1f};`, you are creating a record on the stack. If you then copy it into a data structure, the whole record is copied, not just a pointer. For small records like `point`, this is usually still more efficient than allocating memory and going through a pointer. But for big records, or records with mutable fields, it can be useful to have a single copy on the heap, and refer to that through a pointer. Rust supports several types of pointers. The simplest is the unsafe pointer, written `*TYPE`, which is a completely unchecked pointer type only used in unsafe code (and thus, in typical Rust code, very rarely). The safe pointer types are `@TYPE` for shared, reference-counted boxes, and `~TYPE`, for uniquely-owned pointers. All pointer types can be dereferenced with the `*` unary operator. ### Shared boxes Shared boxes are pointers to heap-allocated, reference counted memory. A cycle collector ensures that circular references do not result in memory leaks. Creating a shared box is done by simply applying the binary `@` operator to an expression. The result of the expression will be boxed, resulting in a box of the right type. For example: let x = @10; // New box, refcount of 1 let y = x; // Copy the pointer, increase refcount // When x and y go out of scope, refcount goes to 0, box is freed NOTE: We may in the future switch to garbage collection, rather than reference counting, for shared boxes. Shared boxes never cross task boundaries. ### Unique boxes In contrast to shared boxes, unique boxes are not reference counted. Instead, it is statically guaranteed that only a single owner of the box exists at any time. let x = ~10; let y <- x; This is where the 'move' (`<-`) operator comes in. It is similar to `=`, but it de-initializes its source. Thus, the unique box can move from `x` to `y`, without violating the constraint that it only has a single owner. NOTE: If you do `y = x` instead, the box will be copied. We should emit warning for this, or disallow it entirely, but do not currently do so. Unique boxes, when they do not contain any shared boxes, can be sent to other tasks. The sending task will give up ownership of the box, and won't be able to access it afterwards. The receiving task will become the sole owner of the box. ### Mutability All pointer types have a mutable variant, written `@mutable TYPE` or `~mutable TYPE`. Given such a pointer, you can write to its contents by combining the dereference operator with a mutating action. fn increase_contents(pt: @mutable int) { *pt += 1; } ## Vectors Rust vectors are always heap-allocated and unique. A value of type `[TYPE]` is represented by a pointer to a section of heap memory containing any number of `TYPE` values. NOTE: This uniqueness is turning out to be quite awkward in practice, and might change. Vector literals are enclosed in square brackets. Dereferencing is done with square brackets (and zero-based): let myvec = [true, false, true, false]; if myvec[1] { std::io::println("boom"); } By default, vectors are immutable—you can not replace their elements. The type written as `[mutable TYPE]` is a vector with mutable elements. Mutable vector literals are written `[mutable]` (empty) or `[mutable 1, 2, 3]` (with elements). Growing a vector in Rust is not as inefficient as it looks (the `+` operator means concatenation when applied to vector types): let myvec = [], i = 0; while i < 100 { myvec += [i]; i += 1; } Because a vector is unique, replacing it with a longer one (which is what `+= [i]` does) is indistinguishable from appending to it in-place. Vector representations are optimized to grow logarithmically, so the above code generates about the same amount of copying and reallocation as `push` implementations in most other languages. ## Strings The `str` type in Rust is represented exactly the same way as a vector of bytes (`[u8]`), except that it is guaranteed to have a trailing null byte (for interoperability with C APIs). This sequence of bytes is interpreted as an UTF-8 encoded sequence of characters. This has the advantage that UTF-8 encoded I/O (which should really be the goal for modern systems) is very fast, and that strings have, for most intents and purposes, a nicely compact representation. It has the disadvantage that you only get constant-time access by byte, not by character. A lot of algorithms don't need constant-time indexed access (they iterate over all characters, which `std::str::chars` helps with), and for those that do, many don't need actual characters, and can operate on bytes. For algorithms that do really need to index by character, there's the option to convert your string to a character vector (using `std::str::to_chars`). Like vectors, strings are always unique. You can wrap them in a shared box to share them. Unlike vectors, there is no mutable variant of strings. They are always immutable. ## Resources Resources are data types that have a destructor associated with them. resource file_desc(fd: int) { close_file_desc(fd); } This defines a type `file_desc` and a constructor of the same name, which takes an integer. Values of such a type can not be copied, and when they are destroyed (by going out of scope, or, when boxed, when their box is cleaned up), their body runs. In the example above, this would cause the given file descriptor to be closed. NOTE: We're considering alternative approaches for data types with destructors. Resources might go away in the future.