It seems that gcc has some limitation on complex constant folding. Here is an example:
static inline unsigned int DJBHash(const char *str)
{
int i;
unsigned int hash = 5381;
for(i = 0; i < strlen(str); i++)
{
hash = ((hash << 5) + hash) + str[i];
}
return hash;
}
int f(void)
{
return DJBHash("01234567890123456");
}
When running with -O3 optimization level (gcc 4.8), it unfolds the loop in DJBHash nicely and calculates the hash value for that string during compile time.
However, when making the string one character longer return DJBHash("012345678901234567");
it does not fold it any more and generates a loop with a conditional jump instruction.
I would like to fold a literal string in any length into its hash value as a compile time constant.
Could this be done?
My question was about the constant-folding optimizations on gcc (see the title - please do not remove gcc and compiler tags)
Many answers here try to solve the problem with either Templates or constexpr. It's good to know about these options and thanks for posting them for the benefit of all. However, they do not directly answer my question.
Practically, I'm working on a gcc port so I could change and build gcc source code, if needed. But I'm limited to C and I would like to solve this problem in this scope.
Constant folding is an optimization technique that eliminates expressions that calculate a value that can already be determined before code execution. These are typically calculations that only reference constant values or expressions that reference variables whose values are constant.
Python tries to fold every single constant expression present but there are some cases where even though the expression is constant, but Python chooses not to fold it. For example, Python does not fold x = 4 ** 64 while it does fold x = 2 ** 64 .
-O3 : the highest level of optimization possible. It enables optimizations that are expensive in terms of compile time and memory usage. Compiling with -O3 is not a guaranteed way to improve performance, and in fact, in many cases, can slow down a system due to larger binaries and increased memory usage.
GCC has a range of optimization levels, plus individual options to enable or disable particular optimizations. The overall compiler optimization level is controlled by the command line option -On, where n is the required optimization level, as follows: -O0 . (default).
The OP is interested in constant-folding in C, but just for its C++ sibling: in C++14, you can simply put constexpr
in front of both functions, and modify the loop to to compensate for strlen()
not being constexpr
#include<iostream>
static inline constexpr unsigned int DJBHash(const char *str)
{
unsigned int hash = 5381;
for(auto i = 0; i < 512; ++i) {
if (*str == '\0') return hash;
hash = ((hash << 5) + hash) + static_cast<unsigned int>(*str);
}
return hash;
}
constexpr unsigned int f(void)
{
return DJBHash("01234567890123456");
}
int main()
{
constexpr auto h = f();
std::cout << std::hex << h << "\n"; // 88a7b505
}
Live Example using Clang 3.4 SVN with -std=c++1y
.
NOTE: the current Clang implementation does not properly run with a while(*str != '\0')
. Instead, a finite loop of 512 with a return condition inside does the job.
Here's a version using constexpr
. It's slightly different from the others in one respect -- being recursive, it was easiest to hash the string back to front, so to speak. For example, the value it gives for "abc" will be what you'd normally expect from "cba" instead. I don't think this should make any real difference in use, as long as you use one or the other consistently (but given the vagaries of hashing, I could be wrong about that).
It does evaluate at compile time though -- for example, we can use the results as labels in a switch
statement:
#include <iostream>
unsigned constexpr const_hash(char const *input) {
return *input ?
static_cast<unsigned>(*input) + 33 * const_hash(input + 1) :
5381;
}
int main(int argc, char **argv) {
switch (const_hash(argv[1])) {
case const_hash("one"): std::cout << "one"; break;
case const_hash("two"): std::cout << "two"; break;
}
}
Obviously, there could be collisions, so you generally wouldn't want to use it as case statement labels -- I mostly did that to force a situation in which it would fail to compile if the result wasn't a compile-time constant.
Edit: if you care about the hash algorithm being "correct", I guess this is more accurate (with thanks to @Abyx):
unsigned constexpr const_hash(char const *input, unsigned hash = 5381) {
return *input ?
const_hash(input + 1, hash * 33 + static_cast<unsigned>(*input)):
hash;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With