Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pig: loading a data file using an external schema file

I have a data file and a corresponding schema file stored in separate locations. I would like to load the data using the schema in the schema-file. I tried using

A= LOAD '<file path>' USING PigStorage('\u0001') as '<schema-file path>' 

but get an error.

What is the syntax for correctly loading the file?

The schema file format is something like:

data1 - complex - - - - format - -
data1 event_type - - - - - long - "ends '\001'"
data1 event_id - - - - - varchar(50) - "ends '\001'"
data1 name_format - - - - - varchar(10) - "ends newline"
like image 552
Shaharg Avatar asked Nov 24 '13 10:11

Shaharg


2 Answers

It's possible to load data with schema file.

When you store your data with the '-schema' flag, in the output path, there is .pig-schema file that hold json with the schema.

You can use it when loading data

B = LOAD '<>' USING PigStorage(',','-schema'); 

You can see the schema by running

describe A;

Check this good post for more details.

This feature is available beginning with Pig 0.10.

like image 66
Mzf Avatar answered Jan 01 '23 10:01

Mzf


The AS clause is for specifying the schema directly not the path to the schema file.

 A = LOAD '<file path>' USING PigStorage('\u0001') as 'type: long, id:chararray, nameformat:chararray';

Alternatively, a file named .pig_schema containing the schema and located in your input directory could work as well. Never tried that though. It must be a JSON file with the following syntax:

{"fields":[
        {"name":"type","type":55,"description":"Fu","schema":null},
        {"name":"id","type":15,"description":"Bar","schema":null},
        {"name":"nameFormat","type":55,"description":"Xu","schema":null},
    ] ,"version":0,"sortKeys":[],"sortKeyOrders":[]}

This file is also generated if you specify the -schema option when storing with PigStorage.

like image 27
Frederic Avatar answered Jan 01 '23 08:01

Frederic