I am trying to construct a new column that gives a value of 1 if it's the first time that an element of the column "type" has had a specific value of the column "xx", and gives a value of 0 in any other case.
The original dataframe (df) I am working with is:
idx = [np.array(['Jan-18', 'Jan-18', 'Feb-18', 'Mar-18', 'Mar-18', 'Mar-18','Apr-18', 'Apr-18', 'May-18', 'Jun-18', 'Jun-18', 'Jun-18','Jul-18', 'Aug-18', 'Aug-18', 'Sep-18', 'Sep-18', 'Oct-18','Oct-18', 'Oct-18', 'Nov-18', 'Dec-18', 'Dec-18',]),np.array(['A', 'B', 'B', 'A', 'B', 'C', 'A', 'B', 'B', 'A', 'B', 'C','A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'A', 'B', 'C'])]
data = [{'xx': 1000}, {'xx': 1000}, {'xx': 1200}, {'xx': 800}, {'xx': 800}, {'xx': 800},{'xx': 1000}, {'xx': 1000}, {'xx': 800}, {'xx': 1200}, {'xx': 1200}, {'xx': 1200},{'xx': 1000}, {'xx': 1000}, {'xx': 1000}, {'xx': 1600}, {'xx': 1600}, {'xx': 1000}, {'xx': 800}, {'xx': 800}, {'xx': 1000}, {'xx': 1600}, {'xx': 1600}]
df = pd.DataFrame(data, index=idx, columns=['xx'])
df.index.names=['date','type']
df=df.reset_index()
df['date'] = pd.to_datetime(df['date'],format = '%b-%y')
df=df.set_index(['date','type'])
df['xx'] = df.xx.astype('float')
The result I am looking for is:
xx yy
date type
2018-01-01 A 1000.0 1.0
B 1000.0 1.0
2018-02-01 B 1200.0 1.0
2018-03-01 A 800.0 1.0
B 800.0 1.0
C 800.0 1.0
2018-04-01 A 1000.0 0.0
B 1000.0 0.0
2018-05-01 B 800.0 0.0
2018-06-01 A 1200.0 1.0
B 1200.0 0.0
C 1200.0 1.0
2018-07-01 A 1000.0 0.0
2018-08-01 B 1000.0 0.0
C 1000.0 1.0
2018-09-01 A 1600.0 1.0
B 1600.0 1.0
2018-10-01 C 1000.0 0.0
A 800.0 0.0
B 800.0 0.0
2018-11-01 A 1000.0 0.0
2018-12-01 B 1600.0 0.0
C 1600.0 1.0
I tried the following code, but it doesn't work (it gives an errror message):
df['yy'] = df.assign(zz=(df.groupby(['type','xx']).first())).zz.transform(lambda x: 1)
The error message says
ValueError: Wrong number of items passed 0, placement implies 1.
I tried other metodhos, such as nth(0), but it doesn't work either. Any suggestion of how solving this problem is very welcome.
Try:
df['yy'] = (df.groupby(level=1).xx
.apply(lambda x: (~x.duplicated()).astype(int))
)
df['yy']
outputs:
date type
2018-01-01 A 1
B 1
2018-02-01 B 1
2018-03-01 A 1
B 1
C 1
2018-04-01 A 0
B 0
2018-05-01 B 0
2018-06-01 A 1
B 0
C 1
2018-07-01 A 0
2018-08-01 B 0
C 1
2018-09-01 A 1
B 1
2018-10-01 C 0
A 0
B 0
2018-11-01 A 0
2018-12-01 B 0
C 1
Name: yy, dtype: int32
Using groupby
+ cumcount
+ astype
:
df['yy'] = df.reset_index().groupby(['type','xx']).cumcount().eq(0).astype(int).values
Result:
xx yy
date type
2018-01-01 A 1000.0 1
B 1000.0 1
2018-02-01 B 1200.0 1
2018-03-01 A 800.0 1
B 800.0 1
C 800.0 1
2018-04-01 A 1000.0 0
B 1000.0 0
2018-05-01 B 800.0 0
2018-06-01 A 1200.0 1
B 1200.0 0
C 1200.0 1
2018-07-01 A 1000.0 0
2018-08-01 B 1000.0 0
C 1000.0 1
2018-09-01 A 1600.0 1
B 1600.0 1
2018-10-01 C 1000.0 0
A 800.0 0
B 800.0 0
2018-11-01 A 1000.0 0
2018-12-01 B 1600.0 0
C 1600.0 1
duplicated
type
and xx
pandas.Series
because I want to use the pandas.Series.duplicated
methodnumpy.where
to choose between 0
and 1
Note: This does not use groupby
and consequently should be more efficient.
s = pd.Series([*zip(df.index.get_level_values('type'), df.xx)])
df.assign(id=np.where(s.duplicated(), 0, 1))
xx id
date type
2018-01-01 A 1000.0 1
B 1000.0 1
2018-02-01 B 1200.0 1
2018-03-01 A 800.0 1
B 800.0 1
C 800.0 1
2018-04-01 A 1000.0 0
B 1000.0 0
2018-05-01 B 800.0 0
2018-06-01 A 1200.0 1
B 1200.0 0
C 1200.0 1
2018-07-01 A 1000.0 0
2018-08-01 B 1000.0 0
C 1000.0 1
2018-09-01 A 1600.0 1
B 1600.0 1
2018-10-01 C 1000.0 0
A 800.0 0
B 800.0 0
2018-11-01 A 1000.0 0
2018-12-01 B 1600.0 0
C 1600.0 1
IIUC
idx=df.groupby([df.index.get_level_values(1),df.xx]).head(1).index
df.loc[:,'new']=0
df.loc[idx,'new']=1
df
Out[869]:
xx new
date type
2018-01-01 A 1000.0 1
B 1000.0 1
2018-02-01 B 1200.0 1
2018-03-01 A 800.0 1
B 800.0 1
C 800.0 1
2018-04-01 A 1000.0 0
B 1000.0 0
2018-05-01 B 800.0 0
2018-06-01 A 1200.0 1
B 1200.0 0
C 1200.0 1
2018-07-01 A 1000.0 0
2018-08-01 B 1000.0 0
C 1000.0 1
2018-09-01 A 1600.0 1
B 1600.0 1
2018-10-01 C 1000.0 0
A 800.0 0
B 800.0 0
2018-11-01 A 1000.0 0
2018-12-01 B 1600.0 0
C 1600.0 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With