I'm trying to profile my application to see if I can reproduce this blogpost. I added -D mapred.task.profile=true to the command line and checked in the job configuration that it took.
Hadoop: The Definitive Guide says the profile info will appear in the Unix dir I ran the job from. The dir I started from has a file attempt_201305011806_0042_m_000002_0.profile, which is correct job ID but there wasn't a mapper #2 (only 1 mapper and it didn't fail). The output only has the header info in the profile file; there isn't any actual profiling info.
The Hadoop docs say the output will be in the user log directory but I can't find anything. If I go into the task logs for the mapper, there's profiling info under "profile.out logs" with legitimate info. My HDFS output dir doesn't have the profiling info at all. Shouldn't the profiling output be in HDFS somewhere?
Also, it only gives text-based output in the log but all of the tools I've found to visualize the profile assume binary hprof format. Any ideas for how I could get a binary profile or else load a text-based profile into an hprof tool?
I noticed there's a space at
-D mapred.task.profile=true
Is that a typo? If yes, just remove it and see what happens. Also, you should be able to see a profiler files under the user log directory, which is usually where you ran the job from. Also, hprof is the default for hadoop, so check if you are not overwriting it with
-Dmapred.task.profile.params
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With