I am currently debugging a pig script. I'd like to define a tuple in the Pig file directly (instead of the basic "Load" function).
Is there a way to do it?
I am looking for something like that:
A= ('name#bob'','age#29';'name#paul','age#12')
The dump Will return :
('bob',29)
('paul',12)
An ordered list of Data. A tuple has fields, numbered 0 through (number of fields - 1). The entry in the field can be any datatype, or it can be null.
Pig has three complex data types: maps, tuples, and bags. All of these types can contain data of any type, including other complex types. So it is possible to have a map where the value field is a bag, which contains a tuple where one of the fields is a map.
Now load the data from the file student_data. txt into Pig by executing the following Pig Latin statement in the Grunt shell. grunt> student = LOAD 'hdfs://localhost:9000/pig_data/student_data.txt' USING PigStorage(',') as ( id:int, firstname:chararray, lastname:chararray, phone:chararray, city:chararray );
Currently pig maps need the key to a chararray (string) that you supply and not a variable which contains a string. so in map#key the key has to be constant string that you supply (eg: map#'keyvalue').
It is in fact impossibble to do this in pig as it currently stands. If you just want to debug create a file in hadoop and load that. Write the data you want into the file (whatever you would have created manually had it been possibble) and upload it. Then load it using pig.
The following (dirty) trick do the job: - create a file With one empty row ans store it to your HDFS. - load it : Line = load /user/toto/onelinefile USING .. - create own datas : foreach line generate 'bob' as name, 22 as age;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With