Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does adding a semicolon at the end of `return` make a difference?

The Rust Guide states that:

The semicolon turns any expression into a statement by throwing away its value and returning unit instead.

I thought I got this concept down until I ran an experiment:

fn print_number(x: i32, y: i32) -> i32 {
    if x + y > 20 { return x }      
    x + y 
}

Which compiles fine. Then, I added a semicolon at the end of the return line (return x;). From what I understand, this turns the line into a statement, returning the unit data type ().

Nonetheless, the end result is the same.

like image 245
sargas Avatar asked Oct 19 '14 02:10

sargas


2 Answers

Normally, every branch in the if expression should have the same type. If the type for some branch is underspecified, the compiler tries to find the single common type:

fn print_number(x: int, y: int) {
  let v = if x + y > 20 {
    3 // this can be either 3u, 3i, 3u8 etc.
  } else {
    x + y // this is always int
  };
  println!("{}", v);
}

In this code, 3 is underspecified but the else branch forces it to have the type of int.

This sounds simple: There is a function that "unifies" two or more types into the common type, or it will give you an error when that's not possible. But what if there were a fail! in the branch?

fn print_number(x: int, y: int) {
  let v = if x + y > 20 {
    fail!("x + y too large") // ???
  } else {
    x + y // this is always int
  };
  println!("{}", v); // uh wait, what's the type of `v`?
}

I'd want that fail! does not affect other branches, it is an exceptional case after all. Since this pattern is quite common in Rust, the concept of diverging type has been introduced. There is no value which type is diverging. (It is also called an "uninhabited type" or "void type" depending on the context. Not to be confused with the "unit type" which has a single value of ().) Since the diverging type is naturally a subset of any other types, the compiler conclude that v's type is just that of the else branch, int.

Return expression is no different from fail! for the purpose of type checking. It abruptly escapes from the current flow of execution just like fail! (but does not terminate the task, thankfully). Still, the diverging type does not propagate to the next statement:

fn print_number(x: int, y: int) {
  let v = if x + y > 20 {
    return; // this is diverging
    () // this is implied, even when you omit it
  } else {
    x + y // this is always int
  };
  println!("{}", v); // again, what's the type of `v`?
}

Note that the sole semicoloned statement x; is equivalent to the expression x; (). Normally a; b has the same type as b, so it would be quite strange that x; () has a type of () only when x is not diverging, and it diverges when x does diverge. That's why your original code didn't work.

It is tempting to add a special case like that:

  • Why don't you make x; () diverging when x diverges?
  • Why don't you assume uint for every underspecified integer literal when its type cannot be inferred? (Note: this was the case in the past.)
  • Why don't you automatically find the common supertrait when unifying multiple trait objects?

The truth is that, designing the type system is not very hard, but verifying it is much harder and we want to ensure that Rust's type system is future-proof and long standing. Some of them may happen if it really is useful and it is proved "correct" for our purpose, but not immediately.

like image 129
Kang Seonghoon Avatar answered Sep 30 '22 18:09

Kang Seonghoon


I'm not 100% sure of what I'm saying but it kinda makes sense.

There's an other concept coming into play: reachability analysis. The compiler knows that what follows a return expression statement is unreachable. For example, if we compile this function:

fn test() -> i32 {
    return 1;
    2
}

We get the following warning:

warning: unreachable expression
 --> src/main.rs:3:5
  |
3 |     2
  |     ^
  |

The compiler can ignore the "true" branch of the if expression if it ends with a return expression and only consider the "false" branch when determining the type of the if expression.

You can also see this behavior with diverging functions. Diverging functions are functions that don't return normally (e.g. they always fail). Try replacing the return expression with the fail! macro (which expands to a call to a diverging function). In fact, return expressions are also considered to be diverging; this is the basis of the aforementioned reachability analysis.

However, if there's an actual () expression after the return statement, you'll get an error. This function:

fn print_number(x: i32, y: i32) -> i32 {
    if x + y > 20 {
        return x;
        ()
    } else {
        x + y
    }
}

gives the following error:

error[E0308]: mismatched types
 --> src/main.rs:4:9
  |
4 |         ()
  |         ^^ expected i32, found ()
  |
  = note: expected type `i32`
             found type `()`

In the end, it seems diverging expressions (which includes return expressions) are handled differently by the compiler when they are followed by a semicolon: the statement is still diverging.

like image 42
Francis Gagné Avatar answered Sep 30 '22 17:09

Francis Gagné