Pythonic way of applying regex to all columns of dataframe

Tags:

I have a dataframe containing keywords and value in all columns. See the example below.

Input DataFrame

I want to apply regex to all the columns. So I use for loop and apply the regex:

for i in range (1,maxExtended_Keywords):
    temp = 'extdkey_' + str(i)
    Extended_Keywords[temp] = Extended_Keywords[temp].str.extract(":(.*)",expand=True)

And I get the desired final result. No issues there.

Desired output

However, just curios is there a pythonic way to apply regex to entire dataframe instead of using for loop and applying to column wise.

Thanks,

234

asked Apr 13 '18 19:04

prasadav

2 Answers

Use pandas.DataFrame.replace with regex=True

df.replace('^.*:\s*(.*)', r'\1', regex=True)

Notice that my pattern uses parentheses to capture the part after the ':' and uses a raw string r'\1' to reference that capture group.

MCVE

df = pd.DataFrame([
    [np.nan, 'thing1: hello'],
    ['thing2: world', np.nan]
], columns=['extdkey1', 'extdkey2'])

df

        extdkey1       extdkey2
0            NaN  thing1: hello
1  thing2: world            NaN

df.replace('^.*:\s*(.*)', r'\1', regex=True)

  extdkey1 extdkey2
0      NaN    hello
1    world      NaN

184

answered Sep 28 '22 10:09

piRSquared

You can use applymap, it will apply some function for each element in dataframe, for this problem you can do this:

func = lambda x: re.findall('^.*:\s*(.*)', x)[0] if re.findall('^.*:\s*(.*)', str(x)) else x
df.applymap(func)

Caution: Avoid to use applymap for huge dataframes due to efficiency issue.

answered Sep 28 '22 09:09

romulomadu

Related questions
                            
                                Why does this simple numpy multiply operation raise an "invalid number of arguments" error? [duplicate]
                            
                                How can I accelerate a sparse matrix by dense vector product, currently implemented via scipy.sparse.csc_matrix.dot, using CUDA?
                            
                                python - use previous row's value to update the new rows values
                            
                                Seaborn boxplots shifted incorrectly along x-axis
                            
                                How can I save a figure in tiff with 600 dpi with compression using matplotlib?
                            
                                TensorFlow: TypeError: Using a `tf.Tensor` as a Python `bool` is not allowed
                            
                                Regular expression in python that check if the string just contains letters, numbers and dot(.)
                            
                                Python class inheritance call order
                            
                                Transfer virtualenv to docker image
                            
                                how to force ipython deep reload?
                            
                                pandas group by function applied to a column
                            
                                Correct Way of using Redis Connection Pool in Python
                            
                                Keras seq2seq - word embedding
                            
                                How to hide/mask sensitive data from airflow connections and variable section?
                            
                                How to automatically run python script, when file is added to folder?
                            
                                django.core.exceptions.ImproperlyConfigured: Field name `id` is not valid for model
                            
                                pip command not found after installed it
                            
                                Group by month and year from date field
                            
                                Python tkinter.filedialog askfolder interfering with clr
                            
                                How does odeint() from scypy python module work?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pythonic way of applying regex to all columns of dataframe

Tags:

python

regex

pandas