Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to store grouped records into multiple files with Pig?

After loading and grouping records, how can I store those grouped records into several files, one per group (=userid)?

records = LOAD 'input' AS (userid:int, ...);
grouped_records = GROUP records BY userid;

I'm using Apache Pig version 0.8.1-cdh3u3 (rexported)

like image 812
thomers Avatar asked Feb 16 '12 15:02

thomers


1 Answers

Indeed, there is a MultiStorage class at Piggybank which does exactly what I want - it splits the records by a specified attribute (at index '0' in my example):

STORE records INTO 'output' USING org.apache.pig.piggybank.storage.MultiStorage('output', '0', 'none', ',');
like image 154
thomers Avatar answered Oct 15 '22 07:10

thomers