Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the value for a variable key from a pig map?

Is there a way we can get the value of a map for variable keys using the field as the key? Eg : My company data has locale and name fields like this

 {"en_US", (["en_US" : "English Name"], ["fr_FR" : "French Name"])}

What I want essentially is to get the value of the map using locale as the key as it will be different for different locales.

company_data = load '/data' using PigStorage();

final_company_data = FOREACH company_data GENERATE
                                             value.locale as locale
                                             value.name#locale;

The following gives me an error coz I understand that to retrieve a value from the map we need value.name#'en_US'. Is there a way we can use the locale so it gets substituted for the right value?

Output : final_company_data = {"en_US", "English Name"}
like image 569
TommyT Avatar asked Feb 12 '17 20:02

TommyT


People also ask

What is parameter substitution in Pig?

Similar to regular Pig parameter substitution, you can define parameters using -param/–param_file on Pig's command line. This variable will be treated as one of the binding variables when binding the Pig Latin script. For example, you can invoke the below Python script using: pig –param loadfile=student. txt script.py.

How do you declare a variable in a pig script?

I think you can use the 'Declare' command. It is used to describe one parameter and is used within the PIG script. %declare DESC 'Database' A = load 'data' as (name, desc); B = FILTER A by desc eq '$DESC'; ..... There shouldn't have been a '$' before the DESC declaration.

How the data is represented in Pig?

Pig has three complex data types: maps, tuples, and bags. All of these types can contain data of any type, including other complex types. So it is possible to have a map where the value field is a bag, which contains a tuple where one of the fields is a map.


1 Answers

As far as I remember you can't do that in Pig. The key has to be static value. So eg this should work:

final_company_data = FOREACH company_data GENERATE
                                         value.locale as locale
                                         value.name#'en_US';

If the key set size is not too big you can try something like this (but this includes a lot of typing):

en = FILTER company_data BY value.locale == 'en_US';
final_company_data_en = FOREACH company_data GENERATE
                                         value.locale as locale
                                         value.name#'en_US';
fr = FILTER company_data BY value.locale == 'fr_FR';
final_company_data_en = FOREACH company_data GENERATE
                                         value.locale as locale
                                         value.name#'fr_FR';

and do this for every key and then do the union of all subsets. This solution is poor and ugly but it works.

like image 81
bartektartanus Avatar answered Sep 20 '22 17:09

bartektartanus