I have a text file that I want to Load onto my Pig Engine, The text file have names in it in separate rows, and the data but has errors in it.....special characters....Something like this:
Ja@@$s000on
J@@a%^ke
T!!ina
Mel@ani
I want to remove the special characters from all the names using REGEX ....One way i found to do the job in pig and finally have the output as...
Jason
Jake
Tina
Melani
Can someone please tell me the regex that will do this job in Pig.
Also write the command that will do it as I unable to use the REGEX_EXTRACT and REGEX_EXTRACT_ALL function.
Also can someone explain what is the Significance of the number 1 that we pass to this function as Argument after defining the Regex.
Any help would be highly appreciated.
You can use REPLACE with RegEx to solve this problem.
input.txt
Ja@@$s000on
J@@a%^ke T!!ina Mel@ani
PigScript:
A = LOAD 'input.txt' as line;
B = FOREACH A GENERATE REPLACE(line,'([^a-zA-Z\\s]+)','');
dump B;
Output:
(Jason)
(Jake Tina Melani)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With