Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I don't understand the difference between a slice and reference

I don't understand the difference between a slice and a reference. What is the difference between &String and &str? I read some stuff online that said a reference was a thin pointer and slice is a fat pointer, but I don't know and can't seem to find what those two mean. I know that a slice can coerce into a reference, but how does it do that? What is the Deref trait?

like image 734
Rafael Avatar asked Apr 11 '20 01:04

Rafael


People also ask

How do you define slice in go?

Slice is a variable-length sequence which stores elements of a similar type, you are not allowed to store different type of elements in the same slice. It is just like an array having an index value and length, but the size of the slice is resized they are not in fixed-size just like an array.

What is the difference between Slice and array?

The basic difference between a slice and an array is that a slice is a reference to a contiguous segment of an array. Unlike an array, which is a value-type, slice is a reference type. A slice can be a complete array or a part of an array, indicated by the start and end index.

What is a slice rust?

A slice is a pointer to a block of memory. Slices can be used to access portions of data stored in contiguous memory blocks. It can be used with data structures like arrays, vectors and strings. Slices use index numbers to access portions of data. The size of a slice is determined at runtime.

How do you slice an array in Go lang?

func make([]T, len, cap) []T can be used to create a slice by passing the type, length and capacity. The capacity parameter is optional and defaults to the length. The make function creates an array and returns a slice reference to it. The values are zeroed by default when a slice is created using make.


2 Answers

In Rust, a slice is a contiguous block of homogeneously typed data of varying length.

What does this mean?

  • [u8] is a slice. In memory, this is a block of u8s. The slice itself is the data. Many times though, people refer to &[u8] as a slice. A &[u8] is a pointer to that block of data. That pointer contains two things: a pointer to the data itself, and the length of the data. Since it contains two things, it is therefore called a fat pointer. A &u8 is also a reference (can also be thought of as a pointer in this case *), but we already know that whatever it points to will be a single u8. Therefore, it is a thin pointer since it only has one element.

    You are guaranteed that all the data in a [u8] is of type u8.

    Since your [u8] is just defined as a contiguous block of memory of type u8, there's no compile time definition as to how large it is. Hence, we need to store its length in a pointer to it. We also can't put it on the stack (This translates to: we can't have a local variable that is just a [u8] **).

Expanding:

  • A [T] is a slice of Ts. For any given T, as long as T is itself a sized type ***, we can imagine a type [T].
  • A str is a slice of a string. It is guaranteed to be valid UTF-8 text, and that's everything that separates it from a [u8]. Rust could have dumped the valid UTF-8 guarantee and just defined everything else in str as part of [u8].

Well, since you can't own a slice locally ****, you might be wondering how we create slices.

The answer is that we put the data in something with the size already known, and then borrow slices from that.

Take for example:

let my_array: [u32; 3] = [1, 2, 3];

We can slice my_array into a [u32] like so:

let my_slice: [u32] = my_array[..];

But since we can't own a local variable whose size isn't already known, we must put it under a reference:

let my_slice: &[u32] = &my_array[..];

The point of a slice, is that it's a very flexible (barring lifetimes) method of working with contiguous blocks of data, no matter where the data comes from. I could just as easily made my_array a Vec<u8>, which is heap-allocated, and it would still have worked.

What is the difference between &String and &str?

&String is a reference to the whole string object. The string object in Rust is essentially a Vec<u8>. A Vec contains a pointer to the data it "contains", so your &String could be thought of as a &&str. And, that is why we could do either of the following:

let my_string: String = "Abc".to_string();

let my_str: &str = &my_string[..]; // As explained previously
// OR
let my_str: &str = &*my_string;

The explanation of this brings me to your last question:

What is the deref trait?

The Deref trait, is a trait which describes the dereference (*) operator. As you saw above, I was able to do *my_string. That's because String implements Deref, which allows you to dereference the String. Similarly, I can dereference a Vec<T> into a [T].

Note however, that the Deref trait is used in more places than just where * is used:

let my_string: String = "Abc".to_string();

let my_str: &str = &my_string;

If I try to assign a value of type &T into a place of type &U, then Rust will try to dereference my T, as many times as it takes to get a U, while still keeping at least one reference. Similarly, if I had a &&&&....&&&&T, and I tried to assign it to a &&&&....&&&&U, it would still work.

This is called deref coercion: automatically turning a &T into a &U, where some amount of *T would result in a U.


  • *: Raw pointers *const T and *mut T are the same size as references, but are treated as opaque by the compiler. The compiler doesn't make any guarantees about what is behind a raw pointer, or that they are even correctly aligned. Hence, they are unsafe to dereference. But since the Deref trait defines a deref method which is safe, dereferencing a raw pointer is special, and will not be done automatically either.
  • **: This includes other dynamically sized types too, such as trait objects, and extern types. This also includes structs which contain a dynamically sized type as their last member as well, although these are very difficult to correctly construct, but will become easier in the future with the CoerceUnsized trait. It is possible to invalidate all of this (Except for extern types) with the use of the unsized_locals nightly feature which allows some use of dynamically sized locals.
  • ***: Sized types are all types whose size is known at compile time. You can identify them generically; given a type T, T's size is known at compile time if T: Sized. If T: ?Sized, then its size may not be known at compile time (T: ?Sized is the most flexible requirement for callers since it accepts anything). Since a slice requires the data inside to be contiguous, and homogenous in size and type, dynamically sized types (Or !Sized) aren't possible to contain within a slice, or an array, or a Vec<T>, and maintain O(1) indexing. While Rust could probably write special code for indexing into a group of dynamically sized types, it currently doesn't.
  • ****: You actually can own a slice, it just has to be under a pointer which owns it. This can be, for example, a Box<[T]>, or a Rc<[T]>. These will deallocate the slice on their own (A Box when dropped, and a Rc when all strong and weak references of an Rc are dropped (The value's destructor is called when all strong references are dropped, but the memory isn't freed until all weak references are gone, too.)).
like image 119
Optimistic Peach Avatar answered Oct 07 '22 20:10

Optimistic Peach


What is a reference

A reference is like a pointer from C (which represents a memory location), but references are never invalid* (i.e. null) and you can't do pointer arithmetic on references. Rust's references are pretty similar to C++'s references. One important motivating reason to use references is to avoid moveing or cloneing variables. Let's say you have a function that calculates the sum of a vector (note: this is a toy example, the right way to get the sum of a vector is nums.iter().sum())

fn sum(nums: Vec<u32>) -> Option<u32> {
    if nums.len() == 0 {
        return None;
    }
    let mut sum = 0;
    for num in nums {
        sum += num;
    }
    Some(sum);
}

this function moves the vector, so it is unusable afterward.

let nums = vec!(1,2,3,4,5);
assert_eq!(sum(nums), 15);
assert_eq!(nums[0], 1); //<-- error, nums was moved when we calculated sum

the solution is to pass a reference to a vector

fn sum(nums: &Vec<u32>) -> Option<u32> {
...
}
let nums = vec!(1,2,3,4,5);
assert_eq!(sum(&nums), 15);
assert_eq!(nums[0], 1); // <-- it works!

What is a slice

A slice is "a view into a [contiguous] block of memory represented as a pointer and a length." It can be thought of as a reference to an array (or array-like thing). Part of Rust's safety guarantee is ensuring you don't access elements past the end of your array. To accomplish this, slices are represented internally as a pointer and a length. This is fat compared to pointers, which contain no length information. Similar to the sum example above, if nums were an array, rather than a Vec, you would pass a slice to sum(), rather than the array itself.

String vs str

A str is an array of utf-8 encoded characters, and an &str is a slice of utf-8 encoded characters. String is a Vec of utf-8 encoded characters, and String implements Deref<Target=str>, which means that an &String behaves a lot like (coerces to) an &str. This is similar to how &Vec<u32> behaves like &[u32] (Vec implements Deref<Target=[T]>)


* unless made invalid with unsafe rust

like image 28
asky Avatar answered Oct 07 '22 20:10

asky