Accessing files in hadoop distributed cache

Tags:

hadoop

I want to use the distributed cache to allow my mappers to access data. In main, I'm using the command

DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);

Where /user/peter/cacheFile/testCache1 is a file that exists in hdfs

Then, my setup function looks like this:

public void setup(Context context) throws IOException, InterruptedException{
    Configuration conf = context.getConfiguration();
    Path[] localFiles = DistributedCache.getLocalCacheFiles(conf);
    //etc
}

However, this localFiles array is always null.

I was initially running on a single-host cluster for testing, but I read that this will prevent the distributed cache from working. I tried with a pseudo-distributed, but that didn't work either

I'm using hadoop 1.0.3

thanks Peter

941

asked Dec 06 '12 15:12

Peter Cogan

1 Answers

Problem here was that I was doing the following:

Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);

Since the Job constructor makes an internal copy of the conf instance, adding the cache file afterwards doesn't affect things. Instead, I should do this:

Configuration conf = new Configuration();
DistributedCache.addCacheFile(new URI("/user/peter/cacheFile/testCache1"), conf);
Job job = new Job(conf, "wordcount");

And now it works. Thanks to Harsh on hadoop user list for the help.

145

answered Oct 08 '22 10:10

Peter Cogan

Related questions
                            
                                Container is running beyond virtual memory limits
                            
                                MapReduce Job not showing my print statements on the terminal
                            
                                ERROR jdbc.HiveConnection: Error opening session Hive
                            
                                Converting Unix epoch time to extended ISO8601
                            
                                Hadoop/Hive - Split a single row into multiple rows
                            
                                hadoop java.net.URISyntaxException: Relative path in absolute URI: rsrc:hbase-common-0.98.1-hadoop2.jar
                            
                                Apache Tez architecture Explanation
                            
                                "process information unavailable", jps command in linux
                            
                                Amazon Elastic Map Reduce - Creating a job flow
                            
                                How do you insert data into complex data type "Struct" in Hive
                            
                                How to select all columns of a dataframe in join - Spark-scala
                            
                                Writing to a file in Apache Spark
                            
                                Hadoop 2.9.2, Spark 2.4.0 access AWS s3a bucket
                            
                                Problem with copying local data onto HDFS on a Hadoop cluster using Amazon EC2/ S3
                            
                                Hadoop reduce stops running
                            
                                Storing query result in a variable
                            
                                How to fix the "Illegal partition" error in hadoop?
                            
                                Explanation of YARN's DRF
                            
                                Use SparkContext hadoop configuration within RDD methods/closures, like foreachPartition
                            
                                NoClassDefFoundError org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorManager

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Accessing files in hadoop distributed cache

Tags:

hadoop

Peter Cogan

People also ask

1 Answers

Peter Cogan

Recent Activity

Donate For Us