Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

STORE output to a single CSV?

Tags:

apache-pig

Currently, when I STORE into HDFS, it creates many part files.

Is there any way to store out to a single CSV file?

like image 206
JasonA Avatar asked Mar 28 '12 15:03

JasonA


People also ask

How do I save a CSV output?

First, open the CSV file for writing ( w mode) by using the open() function. Second, create a CSV writer object by calling the writer() function of the csv module. Third, write data to CSV file by calling the writerow() or writerows() method of the CSV writer object.

How do CSV files separate rows?

A CSV file contains a number of rows, each containing a number of columns, usually separated by commas.

Is CSV The best way to store data?

It's probably the worst storage format if you don't plan to view or edit data on the fly. If you're storing large volumes of data, opting for CSVs will cost you both time and money. Today you'll learn about five CSV alternatives. Each provides an advantage, either in read/write time or in file size.


1 Answers

You can do this in a few ways:

  • To set the number of reducers for all Pig opeations, you can use the default_parallel property - but this means every single step will use a single reducer, decreasing throughput:

    set default_parallel 1;

  • Prior to calling STORE, if one of the operations execute is (COGROUP, CROSS, DISTINCT, GROUP, JOIN (inner), JOIN (outer), and ORDER BY), then you can use the PARALLEL 1 keyword to denote the use of a single reducer to complete that command:

    GROUP a BY grp PARALLEL 1;

See Pig Cookbook - Parallel Features for more information

like image 148
Chris White Avatar answered Oct 30 '22 11:10

Chris White