Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is `u32`/`i32` suggested even on limited range number case?

Tags:

rust

Should we use u32/i32 or it's lower variant (u8/i8, u16/i16) when dealing with limited range number like "days in month" which ranged from 1-30 or "score of a subject" which ranged from 0 to 100? Or why we shouldn't?

Is there any optimization or benefit on the lower variant (i.e. memory efficient)?

like image 686
Abdillah Avatar asked Oct 09 '16 02:10

Abdillah


3 Answers

Summary

Correctness should be prioritized over performance and correctness-wise (for ranges like 1–100), all solutions (u8, u32, ...) are equally bad. The best solution would be to create a new type to benefit from strong typing.

The rest of my answer tries to justify this claim and discusses different ways of creating the new type.

More explanation

Let's take a look at the "score of subject" example: the only legal values are 0–100. I'd argue that correctness-wise, using u8 and u32 is equally bad: in both cases, your variable can hold values that are not legal in your semantic context; that's bad!

And arguing that the u8 is better, because there are less illegal values, is like arguing that wrestling a bear is better than walking through New York, because you only have one possibility of dying (blood loss by bear attack) as opposed to the many possibilities of death (car accident, knife attack, drowning, ...) in New York.

So what we want is a type that guarantees to hold only legal values. We want to create a new type that does exactly this. However, there are multiple ways to proceed; each with different advantages and disadvantages.


(A) Make the inner value public

struct ScoreOfSubject(pub u8);

Advantage: at least APIs are more easy to understand, because the parameter is already explained by the type. What is easier to understand:

  • add_record("peter", 75, 47) or
  • add_record("peter", StudentId(75), ScoreOfSubject(47))?

I'd say the latter one ;-)

Disadvantage: we don't actually do any range checking and illegal values can still occur; bad!.


(B) Make inner value private and supply a range checking constructor

struct ScoreOfSubject(pub u8);

impl ScoreOfSubject {
    pub fn new(value: u8) -> Self {
        assert!(value <= 100);
        ScoreOfSubject(value)
    }
    pub fn get(&self) -> u8 { self.0 }
}

Advantage: we enforce legal values with very little code, yeah :)

Disadvantage: working with the type can be annoying. Pretty much every operation requires the programmer to pack & unpack the value.


(C) Add a bunch of implementations (in addition to (B))

(the code would impl Add<_>, impl Display and so on)

Advantage: the programmer can use the type and do all useful operations on it directly -- with range checking! This is pretty optimal.

Please take a look at Matthieu M.'s comment:

[...] generally multiplying scores together, or dividing them, does not produce a score! Strong typing not only enforces valid values, it also enforces valid operations, so that you don't actually divide two scores together to get another score.

I think this is a very important point I failed to make clear before. Strong typing prevents the programmer from executing illegal operations on values (operations that don't make any sense). A good example is the crate cgmath which distinguishes between point and direction vectors, because both support different operations on them. You can find additional explanation here.

Disadvantage: a lot of code :(

Luckily the disadvantage can be reduced by the Rust macro/compiler plugin system. There are crates like newtype_derive or bounded_integer that do this kind of code generation for you (disclaimer: I never worked with them).


But now you say: "you can't be serious? Am I supposed to spend my time writing new types?".

Not necessarily, but if you are working on production code (== at least somewhat important), then my answer is: yes, you should.

like image 183
Lukas Kalbertodt Avatar answered Oct 10 '22 12:10

Lukas Kalbertodt


A no-answer answer: I doubt you would see any difference in benchmarks, unless you do A LOT of arithmetic or process HUGE arrays of numbers.

You should probably just go with the type which makes more sense (no reason to use negatives or have an upper bound in millions for a day of month) and provides the methods you need (e.g. you can't perform abs() directly on an unsigned integer).

like image 25
ljedrz Avatar answered Oct 10 '22 12:10

ljedrz


There could be major benefits using smaller types but you would have to benchmark your application on your target platform to be sure.

The first and most easily realized benefit from the lower memory footprint is better caching. Not only is your data more likely to fit into the cache, but it is also less likely to discard other data in the cache, potentially improving a completely different part of your application. Whether or not this is triggered depends on what memory your application touches and in what order. Do the benchmarks!

Network data transfers have an obvious benefit from using smaller types.

Smaller data allows "larger" instructions. A 128-bit SIMD unit can handle 4 32-bit data OR 16 8-bit data, making certain operations 4 times faster. In benchmarks I´ve made these instructions do execute 4 times faster indeed BUT the whole application improved by less than 1%, and the code became more of a mess. Shaping your program into making better use of SIMD can be tricky.

As of signed/unsigned discussions unsigned has slightly better properties which a compiler may or may not take advantage of.

like image 41
Andreas Avatar answered Oct 10 '22 12:10

Andreas