How to select values in between strings and place in column of dataframe using regex in python

Question

I have a large dataframe containing a column titled "Comment"

within the comment section I need to pull out 3 values and place into separate columns i.e. (Duty cycle, gas, and pressure)

"Data collection START for Duty Cycle: 0, Gas: Vacuum Pressure: 0.000028 Torr"

Currently i am using .split and .tolist to parse the string ->

#split string and sort into columns 
df1 = pd.DataFrame(eventsDf.comment.str.split().tolist(),columns="0 0 0 0 0 0 dutyCycle 0 Gas 0 Pressure 0 ".split())

#join dataFrames
eventsDf = pd.concat([eventsDf, df1], axis=1)

#drop columns not needed
eventsDf.drop(['comment','0',],axis=1,inplace=True)

I found this method rather "hacky" in that in the event the structure of the comment section changes my code would be useless... can anyone show me a more effecient/robust way to go about doing this?? Thank you so much!

piRSquared · Accepted Answer

use str.extract with a regex.

regex = r'Duty Cycle: (?P<Duty_Cycle>\d+), Gas: (?P<Gas>\w+) Pressure: (?P<Pressure>\S+) Torr'
df1 = eventsDf.comment.str.extract(regex, expand=True)
df1

enter image description here

How to select values in between strings and place in column of dataframe using regex in python

Tags:

python

regex

pandas

dataframe

Alex Rosa

1 Answers

piRSquared

Recent Activity

Donate For Us

How to select values in between strings and place in column of dataframe using regex in python

Tags:

python

regex

pandas

dataframe

Alex Rosa

1 Answers

piRSquared

Related questions

Recent Activity

Donate For Us