Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas extract numbers from column into new columns

Tags:

python

pandas

I currently have this df where the rect column is all strings. I need to extract the x, y, w and h from it into separate columns. The dataset is very large so I need an efficient approach

df['rect'].head()
0    <Rect (120,168),260 by 120>
1    <Rect (120,168),260 by 120>
2    <Rect (120,168),260 by 120>
3    <Rect (120,168),260 by 120>
4    <Rect (120,168),260 by 120>

So far this solution works however it's very messy as you can see

df[['x', 'y', 'w', 'h']] = df['rect'].str.replace('<Rect \(', '').str.replace('\),', ',').str.replace(' by ', ',').str.replace('>', '').str.split(',', n=3, expand=True)

Is there a better way? Possibly a regex approach

like image 610
Kenan Avatar asked Sep 04 '18 17:09

Kenan


People also ask

How do I extract a specific value from a DataFrame Pandas?

You can access a single value from a DataFrame in two ways. Method 1: DataFrame.at[index, column_name] property returns a single value present in the row represented by the index and in the column represented by the column name. Method 2: Or you can use DataFrame.

How do you only get numeric values from a column in Python?

To select columns that are only of numeric datatype from a Pandas DataFrame, call DataFrame. select_dtypes() method and pass np. number or 'number' as argument for include parameter.


2 Answers

Using extractall

df[['x', 'y', 'w', 'h']] = df['rect'].str.extractall('(\d+)').unstack().loc[:,0]
Out[267]: 
match    0    1    2    3
0      120  168  260  120
1      120  168  260  120
2      120  168  260  120
3      120  168  260  120
4      120  168  260  120
like image 157
BENY Avatar answered Oct 06 '22 00:10

BENY


Inline

Produce a copy

df.assign(**dict(zip('xywh', df.rect.str.findall('\d+').str)))

                          rect    x    y    w    h
0  <Rect (120,168),260 by 120>  120  168  260  120
1  <Rect (120,168),260 by 120>  120  168  260  120
2  <Rect (120,168),260 by 120>  120  168  260  120
3  <Rect (120,168),260 by 120>  120  168  260  120
4  <Rect (120,168),260 by 120>  120  168  260  120

Or just reassign to df

df = df.assign(**dict(zip('xywh', df.rect.str.findall('\d+').str)))

df

                          rect    x    y    w    h
0  <Rect (120,168),260 by 120>  120  168  260  120
1  <Rect (120,168),260 by 120>  120  168  260  120
2  <Rect (120,168),260 by 120>  120  168  260  120
3  <Rect (120,168),260 by 120>  120  168  260  120
4  <Rect (120,168),260 by 120>  120  168  260  120

Inplace

Modify existing df

df[[*'xywh']] = pd.DataFrame(df.rect.str.findall('\d+').tolist())

df

                          rect    x    y    w    h
0  <Rect (120,168),260 by 120>  120  168  260  120
1  <Rect (120,168),260 by 120>  120  168  260  120
2  <Rect (120,168),260 by 120>  120  168  260  120
3  <Rect (120,168),260 by 120>  120  168  260  120
4  <Rect (120,168),260 by 120>  120  168  260  120
like image 36
piRSquared Avatar answered Oct 06 '22 00:10

piRSquared