I have a large dataframe containing a column titled "Comment"
within the comment section I need to pull out 3 values and place into separate columns i.e. (Duty cycle, gas, and pressure)
"Data collection START for Duty Cycle: 0, Gas: Vacuum Pressure: 0.000028 Torr"
Currently i am using .split and .tolist to parse the string ->
#split string and sort into columns
df1 = pd.DataFrame(eventsDf.comment.str.split().tolist(),columns="0 0 0 0 0 0 dutyCycle 0 Gas 0 Pressure 0 ".split())
#join dataFrames
eventsDf = pd.concat([eventsDf, df1], axis=1)
#drop columns not needed
eventsDf.drop(['comment','0',],axis=1,inplace=True)
I found this method rather "hacky" in that in the event the structure of the comment section changes my code would be useless... can anyone show me a more effecient/robust way to go about doing this?? Thank you so much!
use str.extract
with a regex.
regex = r'Duty Cycle: (?P<Duty_Cycle>\d+), Gas: (?P<Gas>\w+) Pressure: (?P<Pressure>\S+) Torr'
df1 = eventsDf.comment.str.extract(regex, expand=True)
df1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With