I have some data in a pandas dataframe that looks like this;
CAR_TYPE MILEAGE
FORD 100
FORD 100
FORD 200
FORD 300
VW 100
VW 150
VW 150
VW 300
I want to 'factorize' the data to return a unique ID for each pair. However I want the unique ID to 'reset' to zero for seperate car makes. At present my factorization using the following;
df['CAR_ID']=pd.factorize(pd.lib.fast_zip([df.CAR_TYPE.values, df.MILEAGE.values]))[0]
df.CAR_ID=df[['CAR_ID', 'CAR_TYPE']].astype(str).apply(lambda x: ''.join(x), axis=1)
Gives me something like
CAR_TYPE MILEAGE CAR_ID
FORD 100 FORD0
FORD 100 FORD0
FORD 200 FORD1
FORD 300 FORD2
VW 100 VW3
VW 150 VW4
VW 150 VW4
VW 300 VW5
Ideally I'd like
CAR_TYPE MILEAGE IDEAL_CAR_ID
FORD 100 FORD0
FORD 100 FORD0
FORD 200 FORD1
FORD 300 FORD2
VW 100 VW0
VW 150 VW1
VW 150 VW1
VW 300 VW2
Apologies for the relatively dumb question, at wits after a long day. I know its something that could be solved with a stack/unstack, reset_index/set_index.
You can use groupby
with rank
if values in MILEAGE
are sorted per group:
a = df.groupby(['CAR_TYPE'])['MILEAGE'].rank(method='dense') \
.sub(1).astype(int).astype(str)
df['IDEAL_CAR_ID'] = df['CAR_TYPE'].add(a)
print (df)
CAR_TYPE MILEAGE IDEAL_CAR_ID
0 FORD 100 FORD0
1 FORD 100 FORD0
2 FORD 200 FORD1
3 FORD 300 FORD2
4 VW 100 VW0
5 VW 150 VW1
6 VW 150 VW1
7 VW 300 VW2
Another solution with factorize
:
a = df.groupby(['CAR_TYPE'])['MILEAGE'] \
.transform(lambda x: pd.factorize(x)[0]).astype(str)
df['IDEAL_CAR_ID'] = df['CAR_TYPE'].add(a)
print (df)
CAR_TYPE MILEAGE IDEAL_CAR_ID
0 FORD 100 FORD0
1 FORD 100 FORD0
2 FORD 200 FORD1
3 FORD 300 FORD2
4 VW 100 VW0
5 VW 150 VW1
6 VW 150 VW1
7 VW 300 VW2
Different outputs if column is not sorted:
print (df)
CAR_TYPE MILEAGE
0 FORD 500
1 FORD 500
2 FORD 200
3 FORD 300
4 VW 100
5 VW 150
6 VW 150
7 VW 300
a = df.groupby(['CAR_TYPE'])['MILEAGE'].rank(method='dense') \
.sub(1).astype(int).astype(str)
df['IDEAL_CAR_ID'] = df['CAR_TYPE'].add(a)
print (df)
CAR_TYPE MILEAGE IDEAL_CAR_ID
0 FORD 500 FORD2
1 FORD 500 FORD2
2 FORD 200 FORD0
3 FORD 300 FORD1
4 VW 100 VW0
5 VW 150 VW1
6 VW 150 VW1
7 VW 300 VW2
a = df.groupby(['CAR_TYPE'])['MILEAGE'] \
.transform(lambda x: pd.factorize(x)[0]).astype(str)
df['IDEAL_CAR_ID'] = df['CAR_TYPE'].add(a)
print (df)
CAR_TYPE MILEAGE IDEAL_CAR_ID
0 FORD 500 FORD0
1 FORD 500 FORD0
2 FORD 200 FORD1
3 FORD 300 FORD2
4 VW 100 VW0
5 VW 150 VW1
6 VW 150 VW1
7 VW 300 VW2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With