I have a dataframe containing keywords and value in all columns. See the example below.
I want to apply regex to all the columns. So I use for loop and apply the regex:
for i in range (1,maxExtended_Keywords):
temp = 'extdkey_' + str(i)
Extended_Keywords[temp] = Extended_Keywords[temp].str.extract(":(.*)",expand=True)
And I get the desired final result. No issues there.
However, just curios is there a pythonic way to apply regex to entire dataframe instead of using for loop and applying to column wise.
Thanks,
By using df[], loc[], iloc[] and get() you can select multiple columns from pandas DataFrame.
Pandas replace() is a very rich function that is used to replace a string, regex, dictionary, list, and series from the DataFrame. The values of the DataFrame can be replaced with other values dynamically. It is capable of working with the Python regex(regular expression). It differs from updating with . loc or .
Use pandas.DataFrame.replace
with regex=True
df.replace('^.*:\s*(.*)', r'\1', regex=True)
Notice that my pattern uses parentheses to capture the part after the ':'
and uses a raw string r'\1'
to reference that capture group.
df = pd.DataFrame([
[np.nan, 'thing1: hello'],
['thing2: world', np.nan]
], columns=['extdkey1', 'extdkey2'])
df
extdkey1 extdkey2
0 NaN thing1: hello
1 thing2: world NaN
df.replace('^.*:\s*(.*)', r'\1', regex=True)
extdkey1 extdkey2
0 NaN hello
1 world NaN
You can use applymap, it will apply some function for each element in dataframe, for this problem you can do this:
func = lambda x: re.findall('^.*:\s*(.*)', x)[0] if re.findall('^.*:\s*(.*)', str(x)) else x
df.applymap(func)
Caution: Avoid to use applymap for huge dataframes due to efficiency issue.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With