I'm trying to repurpose this function from using split to using str.extract (regex) instead.
def bull_lev(x):
spl = x.rsplit(None, 2)[-2].strip("Xx")
if spl.str.isdigit():
return "+" + spl + "00"
return "+100"
def bear_lev(x):
spl = x.rsplit(None, 2)[-2].strip("Xx")
if spl.str.isdigit():
return "-" + spl + "00"
return "-100"
df["leverage"] = df["name"].map(lambda x: bull_lev(x)
if "BULL" in x else bear_lev(x) if "BEAR" in x else "+100"
I am using pandas for DataFrame handling:
import pandas as pd
df = pd.DataFrame(["BULL AXP UN X3 VON", "BEAR ESTOX 12x S"], columns=["name"])
Desired output:
name leverage
"BULL AXP UN X3 VON" "+300"
"BEAR ESTOX 12x S" "-1200"
Faulty regex attempt for "BULL":
def bull_lev(x):
#spl = x.rsplit(None, 2)[-2].strip("Xx")
spl = x.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).strip("x")
if spl.str.isdigit():
return "+" + spl + "00"
return "+100"
df["leverage"] = df["name"].map(lambda x: bull_lev(x)
if "BULL" in x else bear_lev(x) if "BEAR" in x else "+100")
Produces error:
Traceback (most recent call last):
File "toolkit.py", line 128, in <module>
df["leverage"] = df["name"].map(lambda x: bull_lev(x)
File "/Python/Virtual/py2710/lib/python2.7/site-packages/pandas/core/series.py", line 2016, in map
mapped = map_f(values, arg)
File "pandas/src/inference.pyx", line 1061, in pandas.lib.map_infer (pandas/lib.c:58435)
File "toolkit.py", line 129, in <lambda>
if "BULL" in x else bear_lev(x) if "BEAR" in x else "+100")
File "toolkit.py", line 123, in bear_lev
spl = x.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).strip("x")
AttributeError: 'str' object has no attribute 'str'
I am assuming this is due to str.extract capturing a list while split works directly with the string?
You can handle the positive case using the following:
In [150]:
import re
df['fundleverage'] = '+' + df['name'].str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X') + '00'
df
Out[150]:
name fundleverage
0 BULL AXP UN X3 VON +300
1 BULL ESTOX X12 S +1200
You can use np.where to handle both cases in a one liner:
In [151]:
df['fundleverage'] = np.where(df['name'].str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X').str.isdigit(), '+' + df['name'].str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X') + '00', '+100')
df
Out[151]:
name fundleverage
0 BULL AXP UN X3 VON +300
1 BULL ESTOX X12 S +1200
So the above uses the vectorised str methods strip, extract and isdigit to achieve what you want.
Update
After you changed your requirements (which you should not do for future reference) you can mask the df for the bull and bear cases:
In [189]:
import re
df = pd.DataFrame(["BULL AXP UN X3 VON", "BEAR ESTOX 12x S"], columns=["name"])
bull_mask_name = df.loc[df['name'].str.contains('bull', case=False), 'name']
bear_mask_name = df.loc[df['name'].str.contains('bear', case=False), 'name']
df.loc[df['name'].str.contains('bull', case=False), 'fundleverage'] = np.where(bull_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X').str.isdigit(), '+' + bull_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('X') + '00', '+100')
df.loc[df['name'].str.contains('bear', case=False), 'fundleverage'] = np.where(bear_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('x').str.isdigit(), '-' + bear_mask_name.str.extract(r"(X\d+|\d+X)\s", flags=re.IGNORECASE).str.strip('x') + '00', '-100')
df
Out[189]:
name fundleverage
0 BULL AXP UN X3 VON +300
1 BEAR ESTOX 12x S -1200
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With