Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating New Column In Pandas Dataframe Using Regex [duplicate]

I have a column in a pandas df of type object that I want to parse to get the first number in the string, and create a new column containing that number as an int.

For example:

Existing df

    col
    'foo 12 bar 8'
    'bar 3 foo'
    'bar 32bar 98'

Desired df

    col               col1
    'foo 12 bar 8'    12
    'bar 3 foo'       3
    'bar 32bar 98'    32

I have code that works on any individual cell in the column series

int(re.search(r'\d+', df.iloc[0]['col']).group())

The above code works fine and returns 12 as it should. But when I try to create a new column using the whole series:

df['col1'] = int(re.search(r'\d+', df['col']).group())

I get the following Error:

TypeError: expected string or bytes-like object

I tried wrapping a str() around df['col'] which got rid of the error but yielded all 0's in col1

I've also tried converting col to a list of strings and iterating through the list, which only yields the same error. Does anyone know what I'm doing wrong? Help would be much appreciated.

like image 761
Cam8593 Avatar asked Sep 21 '17 18:09

Cam8593


People also ask

Can you use regex in Pandas?

A regular expression (regex) is a sequence of characters that define a search pattern. To filter rows in Pandas by regex, we can use the str. match() method.

How do I create a new column in Pandas?

To add a column to a Pandas dataframe you can simply assign values: df['YourColumn'] = [1, 2, 3, 4] . Importantly, the data you add must be of the same length as the other columns. If you want to add multiple columns, you can use assign() method: df = df. assign(Newcol1=YourData1, Newcol2=YourData2) .

How do you add a new column to a DataFrame based on another column?

Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.


1 Answers

This will do the trick:

search = []    
for values in df['col']:
    search.append(re.search(r'\d+', values).group())

df['col1'] = search

the output looks like this:

            col    col1
0  foo 12 bar 8      12
1     bar 3 foo       3
2  bar 32bar 98      32
like image 140
Albo Avatar answered Oct 02 '22 00:10

Albo