I have a case where multiple threads must update objects stored in a shared vector. However, the vector is very large, and the number of elements to update is relatively small.
In a minimal example, the set of elements to update can be identified by a (hash-)set containing the indices of elements to update. The code could hence look as follows:
let mut big_vector_of_elements = generate_data_vector();
while has_things_to_do() {
let indices_to_update = compute_indices();
indices_to_update.par_iter() // Rayon parallel iteration
.map(|index| big_vector_of_elements[index].mutate())
.collect()?;
}
This is obviously disallowed in Rust: big_vector_of_elements
cannot be borrowed mutably in multiple threads at the same time. However, wrapping each element in, e.g., a Mutex
lock seems unnecessary: this specific case would be safe without explicit synchronization. Since the indices come from a set, they are guaranteed to be distinct. No two iterations in the par_iter
touch the same element of the vector.
What would be the best way of writing a program that mutates elements in a vector in parallel, where the synchronization is already taken care of by the selection of indices, but where the compiler does not understand the latter?
A near-optimal solution would be to wrap all elements in big_vector_of_elements
in some hypothetical UncontendedMutex
lock, which would be a variant of Mutex
which is ridiculously fast in the uncontended case, and which may take arbitrarily long when contention occurs (or even panics). Ideally, an UncontendedMutex<T>
should also be of the same size and alignment as T
, for any T
.
Multiple questions can be answered with "use Rayon's parallel iterator", "use chunks_mut
", or "use split_at_mut
":
These answers do not seem relevant here, since those solutions imply iterating over the entire big_vector_of_elements
, and then for each element figuring out whether anything needs to be changed. Essentially, this means that such a solution would look as follows:
let mut big_vector_of_elements = generate_data_vector();
while has_things_to_do() {
let indices_to_update = compute_indices();
for (index, mut element) in big_vector_of_elements.par_iter().enumerate() {
if indices_to_update.contains(index) {
element.mutate()?;
}
}
}
This solution takes time proportionate to the size of big_vector_of_elements
, whereas the first solution loops only over a number of elements proportionate to the size of indices_to_update
.
When the compiler can't enforce that mutable references to a slice elements aren't exclusive, Cell
is pretty nice.
You can transform a &mut [T]
into a &Cell<[T]>
using Cell::from_mut
, and then a &Cell<[T]>
into a &[Cell<T>]
using Cell::as_slice_of_cells
. All of this is zero-cost: It's just there to guide the type-system.
A &[Cell<T>]
is like a &[mut T]
, if that were possible to write: A shared reference to a slice of mutable elements. What you can do with Cell
s is limited to read or replace — you can't get a reference, mutable or not, to the wrapped elements themselves. Rust also knows that Cell
isn't thread-safe (it does not implement Sync
). This guarantees that everything is safe, at no dynamic cost.
fn main() {
use std::cell::Cell;
let slice: &mut [i32] = &mut [1, 2, 3];
let cell_slice: &Cell<[i32]> = Cell::from_mut(slice);
let slice_cell: &[Cell<i32>] = cell_slice.as_slice_of_cells();
let two = &slice_cell[1];
let another_two = &slice_cell[1];
println!("This is 2: {:?}", two);
println!("This is also 2: {:?}", another_two);
two.set(42);
println!("This is now 42!: {:?}", another_two);
}
You can sort indices_to_update
and extract mutable references by calling split_*_mut
.
let len = big_vector_of_elements.len();
while has_things_to_do() {
let mut tail = big_vector_of_elements.as_mut_slice();
let mut indices_to_update = compute_indices();
// I assumed compute_indices() returns unsorted vector
// to highlight the importance of sorted order
indices_to_update.sort();
let mut elems = Vec::new();
for idx in indices_to_update {
// cut prefix, so big_vector[idx] will be tail[0]
tail = tail.split_at_mut(idx - (len - tail.len())).1;
// extract tail[0]
let (elem, new_tail) = tail.split_first_mut().unwrap();
elems.push(elem);
tail = new_tail;
}
}
Double check everything in this code; I didn't test it. Then you can call elems.par_iter(...)
or whatever.
I think this is a reasonable place to use unsafe
code. The logic itself is safe but cannot be checked by the compiler because it relies on knowledge outside of the type system (the contract of BTreeSet
, which itself relies on the implementation of Ord
and friends for usize
).
In this sample, we preemptively bounds check all the indices via range
, so each call to add
is safe to use. Since we take in a set, we know that all the indices are disjoint, so we aren't introducing mutable aliasing. It's important to get the raw pointer from the slice to avoid aliasing between the slice itself and the returned values.
use std::collections::BTreeSet;
fn uniq_refs<'i, 'd: 'i, T>(
data: &'d mut [T],
indices: &'i BTreeSet<usize>,
) -> impl Iterator<Item = &'d mut T> + 'i {
let start = data.as_mut_ptr();
let in_bounds_indices = indices.range(0..data.len());
// I copied this from a Stack Overflow answer
// without reading the text that explains why this is safe
in_bounds_indices.map(move |&i| unsafe { &mut *start.add(i) })
}
use std::iter::FromIterator;
fn main() {
let mut scores = vec![1, 2, 3];
let selected_scores: Vec<_> = {
// The set can go out of scope after we have used it.
let idx = BTreeSet::from_iter(vec![0, 2]);
uniq_refs(&mut scores, &idx).collect()
};
for score in selected_scores {
*score += 1;
}
println!("{:?}", scores);
}
Once you have used this function to find all the separate mutable references, you can use Rayon to modify them in parallel:
use rayon::prelude::*; // 1.0.3
fn example(scores: &mut [i32], indices: &BTreeSet<usize>) {
let selected_scores: Vec<_> = uniq_refs(scores, indices).collect();
selected_scores.into_par_iter().for_each(|s| *s *= 2);
// Or
uniq_refs(scores, indices).par_bridge().for_each(|s| *s *= 2);
}
You may wish to consider using a bitset instead of a BTreeMap
to be more efficient, but this answer uses only the standard library.
See also:
Since I've been dealing with a similar problem, here's my solution which I don't recommend using unless absolutely necessary:
struct EvilPtr<T> {
ptr: *mut T,
}
impl<T> EvilPtr<T> {
fn new(inp: &mut T) -> Self {
EvilPtr { ptr: inp as *mut T }
}
unsafe fn deref(&self) -> *mut T {
return self.ptr;
}
}
unsafe impl<T> Sync for EvilPtr<T> {}
unsafe impl<T> Send for EvilPtr<T> {}
Now you can do:
let indices: [usize; 10] = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9];
let mut arr: [i32; 10] = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
let e = EvilPtr::new(&mut arr[0]);
unsafe {
indices.par_iter().for_each(|x: &usize| {
*e.deref().add(*x) += *x as i32;
});
}
println!("{:?}", arr);
If you absolutely need to do this, I recommend you bury it under some user friendly interface, where you can be sure no error can occur.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With