Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to Remove Special Characters In Pig

Tags:

apache-pig

I have a text file that I want to Load onto my Pig Engine, The text file have names in it in separate rows, and the data but has errors in it.....special characters....Something like this:

Ja@@$s000on   
J@@a%^ke
T!!ina
Mel@ani

I want to remove the special characters from all the names using REGEX ....One way i found to do the job in pig and finally have the output as...

Jason
Jake
Tina
Melani

Can someone please tell me the regex that will do this job in Pig. Also write the command that will do it as I unable to use the REGEX_EXTRACT and REGEX_EXTRACT_ALL function.
Also can someone explain what is the Significance of the number 1 that we pass to this function as Argument after defining the Regex.

Any help would be highly appreciated.

like image 435
CodeReaper Avatar asked Feb 11 '23 23:02

CodeReaper


1 Answers

You can use REPLACE with RegEx to solve this problem.

input.txt  
Ja@@$s000on  
J@@a%^ke T!!ina Mel@ani  

PigScript:
A = LOAD 'input.txt' as line;  
B = FOREACH A GENERATE REPLACE(line,'([^a-zA-Z\\s]+)','');  
dump B;  

Output:  
(Jason)  
(Jake Tina Melani)  
like image 64
Sivasakthi Jayaraman Avatar answered Feb 19 '23 05:02

Sivasakthi Jayaraman