How to copy first few lines of a large file in hadoop to a new file?

Tags:

hadoop

I have one big file in hdfs bigfile.txt. I want to copy the first 100 lines of it into a new file on hdfs. I tried the following command:

hadoop fs -cat /user/billk/bigfile.txt |head -100 /home/billk/sample.txt

It gave me a "cat: unable to write output stream" error. I am on hadoop 1.

Are there other ways to do this? (note: copying 1st 100 line to local or another file on hdfs is OK)

401

asked Apr 04 '14 01:04

1 Answers

Like this -

hadoop fs -cat /user/billk/bigfile.txt | head -100 | hadoop -put - /home/billk/sample.txt

I believe the "cat: unable to write output stream" is just because head closed the stream after it read its limit. see this answer about head for hdfs - https://stackoverflow.com/a/19779388/3438870

119

answered Oct 11 '22 04:10

Scott

Related questions
                            
                                Hadoop MapReduce: Appropriate input files size?
                            
                                Hadoop - composite key
                            
                                How can i output hadoop result in csv format
                            
                                Apache Hadoop setXIncludeAware UnsupportedOperationException
                            
                                IOException: Filesystem closed exception when running oozie workflow
                            
                                Java: com.sun.tools.javac.Main not found when trying to compile Hadoop program
                            
                                Differences between Hadoop-common, Hadoop-core and Hadoop-client?
                            
                                overwrite hive partitions using spark
                            
                                Global variables in hadoop
                            
                                A way to export the results from Pig to a database
                            
                                Find the average of numbers using MapReduce
                            
                                How to use Hadoop InputFormats In Apache Spark?
                            
                                Hadoop MapReduce: Clarification on number of reducers
                            
                                What is the difference between hadoop job -kill job_id and yarn application -kill application_id
                            
                                localhost: ERROR: Cannot set priority of datanode process 32156
                            
                                Hadoop on Kubernetes vs Standard Hadoop
                            
                                java.io.IOException: Incompatible clusterIDs
                            
                                how to order my tuple of spark results descending order using value
                            
                                Setting YARN queue in PySpark
                            
                                CAP with distributed System

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to copy first few lines of a large file in hadoop to a new file?

Tags:

hadoop

Rolando

People also ask

1 Answers

Scott

Recent Activity

Donate For Us