Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hdfs copy multiple files to same target directory

I learned that if you want to copy multiple files from one hadoop folder to another hadoop folder you can better create one big 'hdfs dfs -cp' statement with lots of components, instead of creating several hdfs dfs -cp statements. With 'better' I mean that it will improve the overal time it takes to copy files: one command is quicker than several separate -cp commands run after each other.

When I do this and my target directory is the same for all files that I want to copy I get a warning.

I'm executing the following command:

hdfs dfs -cp -f /path1/file1 /pathx/target /path2/file2 /pathx/target /path3/file3 /pathx/target

After executing it I get the following warning returned:

cp: `/pathx/target' to `/pathx/target/target': is a subdirectory of itself

Although I get this weird warning the copy itself succeeds like it should. Is this a bug or am I missing something?

like image 938
R. Sluiter Avatar asked Dec 16 '16 13:12

R. Sluiter


People also ask

How do I combine multiple files into one in HDFS?

Hadoop -getmerge command is used to merge multiple files in an HDFS(Hadoop Distributed File System) and then put it into one single output file in our local file system. We want to merge the 2 files present inside are HDFS i.e. file1. txt and file2. txt, into a single file output.

How do I copy a file from one HDFS path to another HDFS path?

Your answer You can use the cp command in Hadoop. This command is similar to the Linux cp command, and it is used for copying files from one directory to another directory within the HDFS file system.

How do I copy multiple files in Hadoop?

If you have multiple files in an HDFS, use -getmerge option command all these multiple files into one single file download file from a single file system. Optionally -nl can be set to enable adding a newline character LF at the end of each file.

Is Distcp faster than CP?

2) distcp runs a MR job behind and cp command just invokes the FileSystem copy command for every file. 3) If there are existing jobs running, then distcp might take time depending memory/resources consumed by already running jobs.In this case cp would be better.


1 Answers

Try to use the following syntax:

hadoop fs -cp /path1/file1 /path2/file2 path3/file3 /pathx/target
like image 138
Luca Natali Avatar answered Oct 10 '22 19:10

Luca Natali