I want to run unit test but I need to have a org.apache.hadoop.fs.FileSystem instance. Are there any mock or any other solution for creating FileSystem?
Luckily there is a library that makes testing Hadoop fairly easy – MRUnit. MRUnit is based on JUnit and allows for the unit testing of mappers, reducers and some limited integration testing of the mapper – reducer interaction along with combiners, custom counters and partitioners.
There is a general rule to be cautious of writing unit tests that do file I/O, because they tend to be too slow. But there is no absolute prohibition on file I/O in unit tests. In your unit tests have a temporary directory set up and torn down, and create test files and directories within that temporary directory.
bin/hdfs dfs -mkdir /geeks => '/' means absolute path bin/hdfs dfs -mkdir geeks2 => Relative path -> the folder will be created relative to the home directory. touchz: It creates an empty file. copyFromLocal (or) put: To copy files/folders from local file system to hdfs store. This is the most important command.
HDFS has a primary NameNode, which keeps track of where file data is kept in the cluster. HDFS also has multiple DataNodes on a commodity hardware cluster -- typically one per node in a cluster. The DataNodes are generally organized within the same rack in the data center.
If you're using hadoop 2.0.0 and above - consider using a hadoop-minicluster
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-minicluster</artifactId>
<version>2.5.0</version>
<scope>test</scope>
</dependency>
With it, you can create a temporary hdfs on your local machine, and run your tests on it. A setUp method may look like this:
baseDir = Files.createTempDirectory("test_hdfs").toFile().getAbsoluteFile();
Configuration conf = new Configuration();
conf.set(MiniDFSCluster.HDFS_MINIDFS_BASEDIR, baseDir.getAbsolutePath());
MiniDFSCluster.Builder builder = new MiniDFSCluster.Builder(conf);
hdfsCluster = builder.build();
String hdfsURI = "hdfs://localhost:"+ hdfsCluster.getNameNodePort() + "/";
DistributedFileSystem fileSystem = hdfsCluster.getFileSystem();
And in a tearDown method you should shut down your mini hdfs cluster, and remove temporary directory.
hdfsCluster.shutdown();
FileUtil.fullyDelete(baseDir);
Take a look at the hadoop-test
jar
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-test</artifactId>
<version>0.20.205.0</version>
</dependency>
it has classes for setting up a MiniDFSCluster
and MiniMRCluster
so you can test without Hadoop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With