Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Hadoop: How to unit test FileSystem

I want to run unit test but I need to have a org.apache.hadoop.fs.FileSystem instance. Are there any mock or any other solution for creating FileSystem?

like image 492
zohar Avatar asked Nov 29 '11 09:11

zohar


People also ask

How to do unit testing in Hadoop?

Luckily there is a library that makes testing Hadoop fairly easy – MRUnit. MRUnit is based on JUnit and allows for the unit testing of mappers, reducers and some limited integration testing of the mapper – reducer interaction along with combiners, custom counters and partitioners.

Should unit tests write files?

There is a general rule to be cautious of writing unit tests that do file I/O, because they tend to be too slow. But there is no absolute prohibition on file I/O in unit tests. In your unit tests have a temporary directory set up and torn down, and create test files and directories within that temporary directory.

What is the command to run the HDFS commands?

bin/hdfs dfs -mkdir /geeks => '/' means absolute path bin/hdfs dfs -mkdir geeks2 => Relative path -> the folder will be created relative to the home directory. touchz: It creates an empty file. copyFromLocal (or) put: To copy files/folders from local file system to hdfs store. This is the most important command.

Where is HDFS data stored?

HDFS has a primary NameNode, which keeps track of where file data is kept in the cluster. HDFS also has multiple DataNodes on a commodity hardware cluster -- typically one per node in a cluster. The DataNodes are generally organized within the same rack in the data center.


2 Answers

If you're using hadoop 2.0.0 and above - consider using a hadoop-minicluster

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-minicluster</artifactId>
    <version>2.5.0</version>
    <scope>test</scope>
</dependency>

With it, you can create a temporary hdfs on your local machine, and run your tests on it. A setUp method may look like this:

baseDir = Files.createTempDirectory("test_hdfs").toFile().getAbsoluteFile();
Configuration conf = new Configuration();
conf.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR, baseDir.getAbsolutePath());
MiniDFSCluster.Builder builder = new MiniDFSCluster.Builder(conf);
hdfsCluster = builder.build();

String hdfsURI = "hdfs://localhost:"+ hdfsCluster.getNameNodePort() + "/";
DistributedFileSystem fileSystem = hdfsCluster.getFileSystem();

And in a tearDown method you should shut down your mini hdfs cluster, and remove temporary directory.

hdfsCluster.shutdown();
FileUtil.fullyDelete(baseDir);
like image 82
Alexander Tokarev Avatar answered Sep 20 '22 05:09

Alexander Tokarev


Take a look at the hadoop-test jar

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-test</artifactId>
    <version>0.20.205.0</version>
</dependency>

it has classes for setting up a MiniDFSCluster and MiniMRCluster so you can test without Hadoop

like image 22
Arnon Rotem-Gal-Oz Avatar answered Sep 22 '22 05:09

Arnon Rotem-Gal-Oz