I have one big file in hdfs bigfile.txt. I want to copy the first 100 lines of it into a new file on hdfs. I tried the following command:
hadoop fs -cat /user/billk/bigfile.txt |head -100 /home/billk/sample.txt
It gave me a "cat: unable to write output stream" error. I am on hadoop 1.
Are there other ways to do this? (note: copying 1st 100 line to local or another file on hdfs is OK)
Hadoop copyFromLocal command is used to copy the file from your local file system to the HDFS(Hadoop Distributed File System). copyFromLocal command has an optional switch –f which is used to replace the already existing file in the system, means it can be used to update that file.
In order to copy a file from the local file system to HDFS, use Hadoop fs -put or hdfs dfs -put, on put command, specify the local-file-path where you wanted to copy from and then HDFS-file-path where you wanted to copy to.
Hadoop fs -getmerge Command If you have multiple files in an HDFS, use -getmerge option command all these multiple files into one single file download file from a single file system. Optionally -nl can be set to enable adding a newline character LF at the end of each file.
You can use the cp command in Hadoop. This command is similar to the Linux cp command, and it is used for copying files from one directory to another directory within the HDFS file system.
Like this -
hadoop fs -cat /user/billk/bigfile.txt | head -100 | hadoop -put - /home/billk/sample.txt
I believe the "cat: unable to write output stream" is just because head closed the stream after it read its limit. see this answer about head for hdfs - https://stackoverflow.com/a/19779388/3438870
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With