Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select values in between strings and place in column of dataframe using regex in python

I have a large dataframe containing a column titled "Comment"

within the comment section I need to pull out 3 values and place into separate columns i.e. (Duty cycle, gas, and pressure)

"Data collection START for Duty Cycle: 0, Gas: Vacuum Pressure: 0.000028 Torr"

Currently i am using .split and .tolist to parse the string ->

#split string and sort into columns 
df1 = pd.DataFrame(eventsDf.comment.str.split().tolist(),columns="0 0 0 0 0 0 dutyCycle 0 Gas 0 Pressure 0 ".split())

#join dataFrames
eventsDf = pd.concat([eventsDf, df1], axis=1)

#drop columns not needed
eventsDf.drop(['comment','0',],axis=1,inplace=True)

I found this method rather "hacky" in that in the event the structure of the comment section changes my code would be useless... can anyone show me a more effecient/robust way to go about doing this?? Thank you so much!

like image 727
Alex Rosa Avatar asked Oct 31 '22 00:10

Alex Rosa


1 Answers

use str.extract with a regex.

regex = r'Duty Cycle: (?P<Duty_Cycle>\d+), Gas: (?P<Gas>\w+) Pressure: (?P<Pressure>\S+) Torr'
df1 = eventsDf.comment.str.extract(regex, expand=True)
df1

enter image description here

like image 123
piRSquared Avatar answered Nov 09 '22 12:11

piRSquared