Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is weak typing a performance increase or a decrease?

When writing interpreted languages, is it faster to have weak typing or strong typing?

I was wondering this, because often the faster dynamically typed interpreted languages out there (Lua, Javascript), and in fact most interpreted languages use weak typing.

But on the other hand strong typing gives guarantees weak typing does not, so, are optimization techniques possible with one that aren't possible with the other?


With strongly typed I mean no implicit conversions between types. For example this would be illegal in a strongly typed, but (possibly) legal in a weakly typed language: "5" * 2 == 10. Especially Javascript is notorious for these type conversions.

like image 543
orlp Avatar asked Mar 10 '12 13:03

orlp


2 Answers

it seems to me that the question is going to be hard to answer with explicit examples because of the lack of "strongly typed interpreted languages" (using the definitions i understand from the question comments).

i cannot think of any language that is interpreted and does not have implicit conversions. and i think this for two reasons:

  1. interpreted languages tend not be statically typed. i think this is because if you are going to implement a statically typed language then, historically, compilation is relatively easy and gives you a significant performance advantage.

  2. if a language is not statically typed then it is forced into having implicit conversions. the alternative would make life too hard for the programmer (they would have to keep track of types, invisible in the source, to avoid runtime errors).

so, in practice, all interpreted languages are weakly typed. but the question of a performance increase or decrease implies a comparison with some that are not. at least, it does if we want to get into a discussion of different, existing implementation strategies.

now you might reply "well, imagine one". ok. so then you are asking for the performance difference between code that detects the need for a conversion at runtime with code where the programmer has explicitly added the conversion. in that case you are comparing the difference between dynamically detecting the need for a conversion and calling an explicit function specified by the programmer.

on the face of it, detection is always going to add some overhead (in a [late-]compiled language that can be ameliorated by a jit, but you are asking about interpreters). but if you want fail-fast behaviour (type errors) then even the explicit conversion has to check types. so in practice i imagine the difference is relatively small.

and this feeds back to the original point - since the performance cost of weak typing is low (given all the other constraints/assumptions in the question), and the usability costs of the alternative are high, most (all?) interpreted languages support implicit conversion.

[sorry if i am still not understanding. i am worried i am missing something because the question - and this answer - does not seem interesting...]

[edit: maybe a better way of asking the same(?) thing would be something like "what are the comparative advantages/disadvantages of the various ways that dynamic (late binding?) languages handle type conversion?" because i think there you could argue that python's approach is particularly powerful (expressive), while having similar costs to other interpreted languages (and the question avoids having to argue whether python or any other language is "weakly typed" or not).]

like image 117
andrew cooke Avatar answered Sep 19 '22 12:09

andrew cooke


With strongly typed I mean no implicit conversions between types.

"5" * 2 == 10

The problem is that "weak typing" is not a well-defined term, since there are two very different ways such "implicit conversions" can happen, which have pretty much the opposite effect on performance:

  • The "scripting language way": values have a runtime type and the language implicitly applies semantic rules to convert between types (such as formatting a binary number as a decimal string) when an operation calls for the different type. This will tend to decrease performance since it A) requires there to be type information at runtime and b) requires that this information be checked. Both of these requirements introduce overhead.
  • The "C way": at runtime, it's all just bytes. If you can convince the compiler to apply an operation that takes a 4 byte integer on a string, then depending on how exactly you do it, either the first 4 bytes of that string will simply be treated as if they were a (probably very large) integer, or you get a buffer overrun. Or demons flying out of your nose. This method requires no overhead and leads to very fast performance (and very spectacular crashes).
like image 41
Michael Borgwardt Avatar answered Sep 19 '22 12:09

Michael Borgwardt