I've the following column:
column
0 10
1 10
2 8
3 8
4 6
5 6
My goal is to find the today unique values (3 in this case) and create a new column which would create the following
new_column
0 3
1 3
2 2
3 2
4 1
5 1
The numbering starts from length of unique values (3) and same number is repeated if current row is same as previous row based on original column. Number gets decreased as row value changes. All unique values in original column have same number of rows (2 rows for each unique value in this case).
My solution was to groupby the original column and create a new list like below:
i=1
new_time=[]
for j, v in df.groupby('column'):
new_time.append([i]*2)
i=i+1
Then I'd flatten the list sort in decreasing order. Any other simpler solution?
Thanks.
pd.factorize
i, u = pd.factorize(df.column)
df.assign(new=len(u) - i)
column new
0 10 3
1 10 3
2 8 2
3 8 2
4 6 1
5 6 1
dict.setdefault
d = {}
for k in df.column:
d.setdefault(k, len(d))
df.assign(new=len(d) - df.column.map(d))
Use GroupBy.ngroup
with ascending=False
:
df.groupby('column', sort=False).ngroup(ascending=False)+1
0 3
1 3
2 2
3 2
4 1
5 1
dtype: int64
For DataFrame that looks like this,
df = pd.DataFrame({'column': [10, 10, 8, 8, 10, 10]})
. . .where only consecutive values are to be grouped, you'll need to modify your grouper:
(df.groupby(df['column'].ne(df['column'].shift()).cumsum(), sort=False)
.ngroup(ascending=False)
.add(1))
0 3
1 3
2 2
3 2
4 1
5 1
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With