I have created a dataframe that has a bunch of characteristics. I want to create a new column that selects all rows between two specific rows (which are going to be inputs).
Say that the dataframe is the following:
data = {'currency': ['Euro', 'Euro', 'Euro', 'Dollar', 'Dollar', 'Yen',
'Yen', 'Yen', 'Pound', 'Pound', 'Pound, 'Pesos',
'Pesos'],
'cost': [34, 67, 32, 29, 48, 123, 23, 45, 78, 86, 23, 45, 67]}
df = pd.DataFrame(data, columns = ['currency', 'cost'])
df
df table:
I want to add a new column that assigns a 1 when a condition is met. In my case, the condition is all rows between two specific currencies. For example, say that I want all the currencies between 'Dollar' and 'Pound'. My guess is I have to create a mask and use it as a conditional, that will say select all rows between the first 'Dollar' row and the last 'Pound' row (i.e. rows 3-10).
I have problems creating that mask though, as the currencies are selected alphabetically:
mask = (df['currency'] >= 'Dollar') & (df['currency'] <= 'Pound')
The above creates a new column with T in all currencies, except 'Yen'. I can see why the above is failing, but cannot think of a way of doing what I want.
Note: The same names of the currencies will come in groups, e.g. 'Pounds' cannot be in rows 4-5 and then 8-10.
General solution working for duplicated index too:
a = df['currency'].eq('Dollar').cumsum()
b = df['currency'].eq('Pound').iloc[::-1].cumsum()
df['new'] = a.mul(b).clip_upper(1)
Alternative working for unique index:
a = df['currency'].eq('Dollar').idxmax()
b = df['currency'].eq('Pound').iloc[::-1].idxmax()
df['new'] = 0
df.loc[a:b, 'new'] = 1
print (df)
currency cost new
0 Euro 34 0
1 Euro 67 0
2 Euro 32 0
3 Dollar 29 1
4 Dollar 48 1
5 Yen 123 1
6 Yen 23 1
7 Yen 45 1
8 Pound 78 1
9 Pound 86 1
10 Pound 23 1
11 Pesos 45 0
12 Pesos 67 0
Explanation:
Series.eq
what is same as ==
first cumsum
[::-1]
mul
and replace non 0
to 1
by clip_upper
Second solution use idxmax
for first index value and set 1
by loc
Using Numpy's accumulation on logical or
cumor = np.logical_or.accumulate
c = df.currency.values
d = c == 'Dollar'
p = c == 'Pound'
df.assign(new=(cumor(d) & cumor(p[::-1])[::-1]).astype(np.uint))
currency cost new
0 Euro 34 0
1 Euro 67 0
2 Euro 32 0
3 Dollar 29 1
4 Dollar 48 1
5 Yen 123 1
6 Yen 23 1
7 Yen 45 1
8 Pound 78 1
9 Pound 86 1
10 Pound 23 1
11 Pesos 45 0
12 Pesos 67 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With