I try to profile a simple c prog using valgrind:
[zsun@nel6005001 ~]$ valgrind --tool=memcheck ./fl.out
==2238== Memcheck, a memory error detector
==2238== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
==2238== Using Valgrind-3.5.0 and LibVEX; rerun with -h for copyright info
==2238== Command: ./fl.out
==2238==
==2238==
==2238== HEAP SUMMARY:
==2238== in use at exit: 1,168 bytes in 1 blocks
==2238== total heap usage: 1 allocs, 0 frees, 1,168 bytes allocated
==2238==
==2238== LEAK SUMMARY:
==2238== definitely lost: 0 bytes in 0 blocks
==2238== indirectly lost: 0 bytes in 0 blocks
==2238== possibly lost: 0 bytes in 0 blocks
==2238== still reachable: 1,168 bytes in 1 blocks
==2238== suppressed: 0 bytes in 0 blocks
==2238== Rerun with --leak-check=full to see details of leaked memory
==2238==
==2238== For counts of detected and suppressed errors, rerun with: -v
==2238== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 12 from 8)
Profiling timer expired
The c code I am trying to profile is the following:
void forloop(void){
int fac=1;
int count=5;
int i,k;
for (i = 1; i <= count; i++){
for(k=1;k<=count;k++){
fac = fac * i;
}
}
}
"Profiling timer expired" shows up, what does it mean? How to solve this problem? thx!
The problem is that you are using valgrind on a program compiled with -pg
. You cannot use valgrind and gprof together. The valgrind manual suggests using OProfile if you are on Linux and need to profile the actual emulation of the program under valgrind.
By the way, this isn't computing factorial.
If you're really trying to find out where the time goes, you could try stackshots. I put an infinite loop around your code and took 10 of them. Here's the code:
6: void forloop(void){
7: int fac=1;
8: int count=5;
9: int i,k;
10:
11: for (i = 1; i <= count; i++){
12: for(k=1;k<=count;k++){
13: fac = fac * i;
14: }
15: }
16: }
17:
18: int main(int argc, char* argv[])
19: {
20: int i;
21: for (;;){
22: forloop();
23: }
24: return 0;
25: }
And here are the stackshots, re-ordered with the most frequent at the top:
forloop() line 12
main() line 23
forloop() line 12 + 21 bytes
main() line 23
forloop() line 12 + 21 bytes
main() line 23
forloop() line 12 + 9 bytes
main() line 23
forloop() line 13 + 7 bytes
main() line 23
forloop() line 13 + 3 bytes
main() line 23
forloop() line 6 + 22 bytes
main() line 23
forloop() line 14
main() line 23
forloop() line 7
main() line 23
forloop() line 11 + 9 bytes
main() line 23
What does this tell you? It says that line 12 consumes about 40% of the time, and line 13 consumes about 20% of the time. It also tells you that line 23 consumes nearly 100% of the time.
That means unrolling the loop at line 12 might potentially give you a speedup factor of 100/(100-40) = 100/60 = 1.67x approximately. Of course there are other ways to speed up this code as well, such as by eliminating the inner loop, if you're really trying to compute factorial.
I'm just pointing this out because it's a bone-simple way to do profiling.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With