Is there a way we can get the value of a map for variable keys using the field as the key? Eg : My company data has locale and name fields like this
{"en_US", (["en_US" : "English Name"], ["fr_FR" : "French Name"])}
What I want essentially is to get the value of the map using locale as the key as it will be different for different locales.
company_data = load '/data' using PigStorage();
final_company_data = FOREACH company_data GENERATE
value.locale as locale
value.name#locale;
The following gives me an error coz I understand that to retrieve a value from the map we need value.name#'en_US'. Is there a way we can use the locale so it gets substituted for the right value?
Output : final_company_data = {"en_US", "English Name"}
Similar to regular Pig parameter substitution, you can define parameters using -param/–param_file on Pig's command line. This variable will be treated as one of the binding variables when binding the Pig Latin script. For example, you can invoke the below Python script using: pig –param loadfile=student. txt script.py.
I think you can use the 'Declare' command. It is used to describe one parameter and is used within the PIG script. %declare DESC 'Database' A = load 'data' as (name, desc); B = FILTER A by desc eq '$DESC'; ..... There shouldn't have been a '$' before the DESC declaration.
Pig has three complex data types: maps, tuples, and bags. All of these types can contain data of any type, including other complex types. So it is possible to have a map where the value field is a bag, which contains a tuple where one of the fields is a map.
As far as I remember you can't do that in Pig. The key has to be static value. So eg this should work:
final_company_data = FOREACH company_data GENERATE
value.locale as locale
value.name#'en_US';
If the key set size is not too big you can try something like this (but this includes a lot of typing):
en = FILTER company_data BY value.locale == 'en_US';
final_company_data_en = FOREACH company_data GENERATE
value.locale as locale
value.name#'en_US';
fr = FILTER company_data BY value.locale == 'fr_FR';
final_company_data_en = FOREACH company_data GENERATE
value.locale as locale
value.name#'fr_FR';
and do this for every key and then do the union of all subsets. This solution is poor and ugly but it works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With