Currently, when I STORE into HDFS, it creates many part files. Is there any way to store out to a single CSV file?

You can do this in a few ways: <ul> <li> To set the number of reducers for all Pig opeations, you can use the <code>default_parallel</code> property - but this means every single step will use a single reducer, decreasing throughput: <code>set default_parallel 1;</code> </li> <li> Prior to calling STORE, if one of the operations execute is (COGROUP, CROSS, DISTINCT, GROUP, JOIN (inner), JOIN (outer), and ORDER BY), then you can use the <code>PARALLEL 1</code> keyword to denote the use of a single reducer to complete that command: <code>GROUP a BY grp PARALLEL 1;</code> </li> </ul> See Pig Cookbook - Parallel Features for more information

STORE output to a single CSV?

1 Answers

You can do this in a few ways:

To set the number of reducers for all Pig opeations, you can use the default_parallel property - but this means every single step will use a single reducer, decreasing throughput:

set default_parallel 1;
Prior to calling STORE, if one of the operations execute is (COGROUP, CROSS, DISTINCT, GROUP, JOIN (inner), JOIN (outer), and ORDER BY), then you can use the PARALLEL 1 keyword to denote the use of a single reducer to complete that command:

GROUP a BY grp PARALLEL 1;

See Pig Cookbook - Parallel Features for more information

148

answered Oct 30 '22 11:10

Chris White

Related questions
                            
                                Pig Conditional Operators
                            
                                Counting elements for each group using Pig
                            
                                using PIG to load a file
                            
                                How do I store gzipped files using PigStorage in Apache Pig?
                            
                                Export from pig to CSV
                            
                                In spark join, does table order matter like in pig?
                            
                                Hadoop, Hive, Pig, HBase, Cassandra - when to use what? [closed]
                            
                                Still getting "Unable to load realm info from SCDynamicStore" after bug fix
                            
                                A way to export the results from Pig to a database
                            
                                strsplit issue - Pig
                            
                                How do I suppress the bloat of useless information when using the DUMP command while using grunt via 'pig -x local'?
                            
                                Error in pig while loading data
                            
                                how to include external jar file using PIG
                            
                                Join vs COGROUP in PIG
                            
                                Load only particular field in PIG?
                            
                                How to perform a DISTINCT in Pig Latin on a subset of columns?
                            
                                Define tuple datas in the pig script
                            
                                Filtering null values with pig
                            
                                What is the best Pig plugin for Eclipse?
                            
                                How can I incorporate the current input filename into my Pig Latin script?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

STORE output to a single CSV?

Tags:

apache-pig

JasonA

People also ask

1 Answers

Chris White

Recent Activity

Donate For Us