I have a column in a pandas df of type object
that I want to parse to get the first number in the string, and create a new column containing that number as an int
.
For example:
Existing df
col
'foo 12 bar 8'
'bar 3 foo'
'bar 32bar 98'
Desired df
col col1
'foo 12 bar 8' 12
'bar 3 foo' 3
'bar 32bar 98' 32
I have code that works on any individual cell in the column series
int(re.search(r'\d+', df.iloc[0]['col']).group())
The above code works fine and returns 12 as it should. But when I try to create a new column using the whole series:
df['col1'] = int(re.search(r'\d+', df['col']).group())
I get the following Error:
TypeError: expected string or bytes-like object
I tried wrapping a str()
around df['col']
which got rid of the error but yielded all 0's in col1
I've also tried converting col
to a list
of strings and iterating through the list
, which only yields the same error. Does anyone know what I'm doing wrong? Help would be much appreciated.
A regular expression (regex) is a sequence of characters that define a search pattern. To filter rows in Pandas by regex, we can use the str. match() method.
To add a column to a Pandas dataframe you can simply assign values: df['YourColumn'] = [1, 2, 3, 4] . Importantly, the data you add must be of the same length as the other columns. If you want to add multiple columns, you can use assign() method: df = df. assign(Newcol1=YourData1, Newcol2=YourData2) .
Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.
This will do the trick:
search = []
for values in df['col']:
search.append(re.search(r'\d+', values).group())
df['col1'] = search
the output looks like this:
col col1
0 foo 12 bar 8 12
1 bar 3 foo 3
2 bar 32bar 98 32
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With