Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python apply a lambda function into a csv file(Large files)

I want to apply a this function hideEmail to a specific column of my csv file (large file) using python

Example of function :

def hideEmail(email):
    #hide email
    text = re.sub(r'[^@.]', 'x', email)
    return text 

Csv file (large file > 1gb):

    id;Name;firstName;email;profession
    100;toto;tata;[email protected];developer
    101;titi;tete;[email protected];doctor
    ..
    ..

like image 866
Med_siraj Avatar asked Feb 28 '21 14:02

Med_siraj


1 Answers

Load the csv data into a DataFrame:

df = pd.read_csv(r'/path/to/csv')

Then you can just use pd.Series.str.replace directly as it supports regex by default:

df = df.astype(str).apply(lambda x: x.str.replace(r'[^@.]', 'x'), axis=1)

That said, if all you want to do is changing a large csv file, pandas is probably an overkill.. You might have a look at sed. Here's one example:

sed -E 's/(\w+)@(\w+)/xxx@xxx/' /path/to/file.csv > /path/to/new_file.csv
like image 127
fsl Avatar answered Nov 14 '22 21:11

fsl