Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to copy first few lines of a large file in hadoop to a new file?

Tags:

hadoop

I have one big file in hdfs bigfile.txt. I want to copy the first 100 lines of it into a new file on hdfs. I tried the following command:

hadoop fs -cat /user/billk/bigfile.txt |head -100 /home/billk/sample.txt

It gave me a "cat: unable to write output stream" error. I am on hadoop 1.

Are there other ways to do this? (note: copying 1st 100 line to local or another file on hdfs is OK)

like image 401
Rolando Avatar asked Apr 04 '14 01:04

Rolando


People also ask

How do I move or copy data in Hadoop?

Hadoop copyFromLocal command is used to copy the file from your local file system to the HDFS(Hadoop Distributed File System). copyFromLocal command has an optional switch –f which is used to replace the already existing file in the system, means it can be used to update that file.

How do you copy files a file from the place locally onto the Hadoop file?

In order to copy a file from the local file system to HDFS, use Hadoop fs -put or hdfs dfs -put, on put command, specify the local-file-path where you wanted to copy from and then HDFS-file-path where you wanted to copy to.

How do I copy multiple files in Hadoop?

Hadoop fs -getmerge Command If you have multiple files in an HDFS, use -getmerge option command all these multiple files into one single file download file from a single file system. Optionally -nl can be set to enable adding a newline character LF at the end of each file.

What is the best way to copy files between HDFS clusters?

You can use the cp command in Hadoop. This command is similar to the Linux cp command, and it is used for copying files from one directory to another directory within the HDFS file system.


1 Answers

Like this -

hadoop fs -cat /user/billk/bigfile.txt | head -100 | hadoop -put - /home/billk/sample.txt

I believe the "cat: unable to write output stream" is just because head closed the stream after it read its limit. see this answer about head for hdfs - https://stackoverflow.com/a/19779388/3438870

like image 119
Scott Avatar answered Oct 11 '22 04:10

Scott