I currently have this df where the rect column is all strings. I need to extract the x, y, w and h from it into separate columns. The dataset is very large so I need an efficient approach
df['rect'].head()
0 <Rect (120,168),260 by 120>
1 <Rect (120,168),260 by 120>
2 <Rect (120,168),260 by 120>
3 <Rect (120,168),260 by 120>
4 <Rect (120,168),260 by 120>
So far this solution works however it's very messy as you can see
df[['x', 'y', 'w', 'h']] = df['rect'].str.replace('<Rect \(', '').str.replace('\),', ',').str.replace(' by ', ',').str.replace('>', '').str.split(',', n=3, expand=True)
Is there a better way? Possibly a regex approach
You can access a single value from a DataFrame in two ways. Method 1: DataFrame.at[index, column_name] property returns a single value present in the row represented by the index and in the column represented by the column name. Method 2: Or you can use DataFrame.
To select columns that are only of numeric datatype from a Pandas DataFrame, call DataFrame. select_dtypes() method and pass np. number or 'number' as argument for include parameter.
Using extractall
df[['x', 'y', 'w', 'h']] = df['rect'].str.extractall('(\d+)').unstack().loc[:,0]
Out[267]:
match 0 1 2 3
0 120 168 260 120
1 120 168 260 120
2 120 168 260 120
3 120 168 260 120
4 120 168 260 120
Produce a copy
df.assign(**dict(zip('xywh', df.rect.str.findall('\d+').str)))
rect x y w h
0 <Rect (120,168),260 by 120> 120 168 260 120
1 <Rect (120,168),260 by 120> 120 168 260 120
2 <Rect (120,168),260 by 120> 120 168 260 120
3 <Rect (120,168),260 by 120> 120 168 260 120
4 <Rect (120,168),260 by 120> 120 168 260 120
Or just reassign to df
df = df.assign(**dict(zip('xywh', df.rect.str.findall('\d+').str)))
df
rect x y w h
0 <Rect (120,168),260 by 120> 120 168 260 120
1 <Rect (120,168),260 by 120> 120 168 260 120
2 <Rect (120,168),260 by 120> 120 168 260 120
3 <Rect (120,168),260 by 120> 120 168 260 120
4 <Rect (120,168),260 by 120> 120 168 260 120
Modify existing df
df[[*'xywh']] = pd.DataFrame(df.rect.str.findall('\d+').tolist())
df
rect x y w h
0 <Rect (120,168),260 by 120> 120 168 260 120
1 <Rect (120,168),260 by 120> 120 168 260 120
2 <Rect (120,168),260 by 120> 120 168 260 120
3 <Rect (120,168),260 by 120> 120 168 260 120
4 <Rect (120,168),260 by 120> 120 168 260 120
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With