Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using PIG to load a file

I am very new to PIG and I am having what feels like a very basic problem. I have a line of code that reads:

A = load 'Sites/trial_clustering/shortdocs/*'
      AS (word1:chararray, word2:chararray, word3:chararray, word4:chararray);

where each file is basically a line of 4 comma separated words. However PIG is not splitting this into the 4 words. When I do dump A, I get: (Money, coins, loans, debt,,,) I have tried googling and I cannot seem to find what format my file needs to be in so that PIG will interpret it properly. Please help!

like image 510
YuliaPro Avatar asked Nov 11 '11 19:11

YuliaPro


1 Answers

Your problem is that Pig, by default, loads files delimited by tab, not comma. What's happening is "Money, coins, loans, debt" are getting stuck in your first column, word1. When you are printing it, you get the illusion that you have multiple columns, but really the first one is filled with your whole line, then the others are null.

To fix this, you should specify PigStorage to load by comma by doing:

A = LOAD '...' USING PigStorage(',') AS (...);
like image 142
Donald Miner Avatar answered Oct 07 '22 00:10

Donald Miner