Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop single column in Pig

I'm filtering a table by a list of about 20 IDs. Right now my code looks like this:

A = LOAD 'ids.txt' USING PigStorage();
B = LOAD 'massive_table' USING PigStorage();
C = JOIN A BY $0, B BY $0;
D = FOREACH C GENERATE $1, $2, $3, $4, ...
STORE D INTO 'foo' USING PigStorage();

What I don't like is line D, where I have to regenerate a new table to get rid of the joining column by explicitly declaring every single other column I want present (and sometimes that is a lot of columns). I'm wondering if there's something equivalent to:

FILTER B BY $0 IN (A)

or:

DROP $0 FROM C
like image 632
Lucas Avatar asked Dec 15 '22 23:12

Lucas


2 Answers

Maybe similiar-ish to this question:

  • How to "update" a column using pig latin

That references a JIRA ticket: https://issues.apache.org/jira/browse/PIG-1693 which examples how you can use the .. notation to denote all the remaining fields:

D = FOREACH C GENERATE $1 .. ;

This assumes you have 0.9.0+ PIG

like image 56
Chris White Avatar answered Dec 18 '22 13:12

Chris White


Drop column by number

If you would want to drop column number 5, you could do it like so:

D = FOREACH C GENERATE .. $4, $6 .. ;

Drop column by name

If you want to drop a column by name, it does not appear possible by only knowing the name of the column that you want to drop. However, it is possible if you know the names of the columns directly before and after this column. If you want to drop the column(s) between colBeforeMyCol and colAfterMyCol, you could do it like so:

aliasAfter = FOREACH aliasBefore GENERATE 
             .. colBeforeMyCol, colAfterMyCol ..;
like image 40
Dennis Jaheruddin Avatar answered Dec 18 '22 12:12

Dennis Jaheruddin