This is the posdf:
tradingsymbol
0 XYZ2061820500PE
1 XYZ20JUN21000PE
2 ABC20JUN100CE
3 ABC20JUN102.5PE
4 ABC20JUN92.5PE
4 XYZ20JUNFUT
I am doing this to extract the ABC and XYZ to a column:
posdf['symbol'] = posdf['tradingsymbol'].str.extract('^(\D+)', expand=True)
I cannot figure out how I can make a generalised way to extract the following columns:
strike type Expiry
0 20500 PE 20618
1 21000 PE 20JUN
2 100 CE 20JUN
3 102.5 PE 20JUN
4 92.5 PE 20JUN
4 NA FUT 20JUN
type is min 2 chars max 3.
Expiry is always 5 chars. Which could possibly have this form: 20O18 or 20N18 or 20D18.
Adding rows where type can be 3 chars based on Sammy's comment.
Use, Series.str.extract with a given regex pattern:
df1 = df['tradingsymbol'].str.extract(
r'(?P<expiry>\d{5}|\d{2}\w{3})(?P<strike>\d+(?:\.\d+)?)?(?P<type>\w+)')
df1 = df1[['strike', 'type', 'expiry']]
Result:
# print(df1)
strike type expiry
0 20500 PE 20618
1 21000 PE 20JUN
2 100 CE 20JUN
3 102.5 PE 20JUN
4 92.5 PE 20JUN
4 NaN FUT 20JUN
You can test the regex here.
if Strike is always numerical then you can do:
posdf[['Symbol','Expiry','Strike','Type']] = posdf['tradingsymbol'].str.extract('^(\D+)(.{5})([0-9.]*)([a-zA-Z]{2,3})', expand=True)
Result:
tradingsymbol Symbol Expiry Strike Type
0 XYZ2061820500PE XYZ 20618 20500 PE
1 XYZ20JUN21000PE XYZ 20JUN 21000 PE
2 ABC20JUN100CE ABC 20JUN 100 CE
3 ABC20JUN102.5PE ABC 20JUN 102.5 PE
4 ABC20JUN92.5PE ABC 20JUN 92.5 PE
4 XYZ20JUNFUT XYZ 20JUN FUT
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With