Assign Unique Values according Distinct Columns Values

Question

I know the question name is a little ambiguous.

My goal is to assign global key column based on 2 columns + unique value in my data frame.

For example

CountryCode | Accident
   AFG          Car
   AFG          Bike
   AFG          Car
   AFG          Plane
   USA          Car
   USA          Bike
   UK           Car

Let Car = 01, Bike = 02, Plane = 03

My desire global key format is [Accident][CountryCode][UniqueValue]

Unique value is a count of similar [Accident][CountryCode]

so if Accident = Car and CountryCode = AFG and it is the first occurrence, the global key would be 01AFG01

The desired dataframe would look like this:

CountryCode | Accident | GlobalKey
   AFG          Car        01AFG01
   AFG          Bike       02AFG01
   AFG          Car        01AFG02
   AFG          Plane      01AFG03
   USA          Car        01USA01
   USA          Bike       01USA02
   UK           Car        01UK01

I have tried running a for loop to append Accident Number and CountryCode together

for example:

globalKey = []

for x in range(0,6):
    string = df.iloc[x, 1]
    string2 = df.iloc[x, 2]
    if string2 == 'Car':
        number = '01'
    elif string2 == 'Bike':
        number = '02'
    elif string2 == 'Plane':
        number = '03'
    #Concat the number of accident and Country Code
    subKey = number + string
    #Append to the list
    globalKey.append(subKey)

This code will provide me with something like 01AFG, 02AFG based on the value I assign. but I want to assign a unique value by counting the occurrence of when CountryCode and Accident is similar.

I am stuck with the code above. I think there should be a better way to do it using map function in Pandas.

Thanks for helping guys! much appreciate!

Thanos · Accepted Answer

You can try with cumcount to achieve this in a number of steps, like this:

In [1]: df = pd.DataFrame({'Country':['AFG','AFG','AFG','AFG','USA','USA','UK'], 'Accident':['Car','Bike','Car','Plane','Car','Bike','Car']})

In [2]: df
Out[2]: 
  Accident Country
0      Car     AFG
1     Bike     AFG
2      Car     AFG
3    Plane     AFG
4      Car     USA
5     Bike     USA
6      Car      UK

## Create a column to keep incremental values for `Country`
In [3]: df['cumcount'] = df.groupby('Country').cumcount()

In [4]: df
Out[4]: 
  Accident Country  cumcount
0      Car     AFG         0
1     Bike     AFG         1
2      Car     AFG         2
3    Plane     AFG         3
4      Car     USA         0
5     Bike     USA         1
6      Car      UK         0

## Create a column to keep incremental values for combination of `Country`,`Accident`
In [5]: df['cumcount_type'] = df.groupby(['Country','Accident']).cumcount()

In [6]: df
Out[6]: 
  Accident Country  cumcount  cumcount_type
0      Car     AFG         0              0
1     Bike     AFG         1              0
2      Car     AFG         2              1
3    Plane     AFG         3              0
4      Car     USA         0              0
5     Bike     USA         1              0
6      Car      UK         0              0

And from that point on you can concatenate the values of cumcount, cumcount_type and Country to achieve what you're after.

Maybe you want to add 1 to each of the values you have under the different counts, depending on whether you want to start counting from 0 or 1.

I hope this helps.

danio · Answer

First of all, don't use for loops if you can help it. For example, you can do your Accident to code mapping with:

df['AccidentCode'] = df['Accident'].map({'Car': '01', 'Bike': '02', 'Plane': '03'})

To get the unique code, Thanos has shown how to do that using GroupBy.cumcount:

df['CA_ID'] = df.groupby(['CountryCode', 'Accident']).cumcount() + 1

And then to put them all together into a unique key:

df['NewKey'] = df['AccidentCode'] + df['CountryCode'] + df['CA_ID'].map('{:0>2}'.format)

which gives:

  CountryCode Accident GlobalKey AccidentCode  CA_ID   NewKey
0         AFG      Car   01AFG01           01      1  01AFG01
1         AFG     Bike   02AFG01           02      1  02AFG01
2         AFG      Car   01AFG02           01      2  01AFG02
3         AFG    Plane   01AFG03           03      1  03AFG01
4         USA      Car   01USA01           01      1  01USA01
5         USA     Bike   01USA02           02      1  02USA01
6          UK      Car    01UK01           01      1   01UK01

Assign Unique Values according Distinct Columns Values

Tags:

python

pandas

dataframe

group-by

Phurich.P

2 Answers

Thanos

danio

Recent Activity

Donate For Us

Assign Unique Values according Distinct Columns Values

Tags:

python

pandas

dataframe

group-by

Phurich.P

2 Answers

Thanos

danio

Related questions

Recent Activity

Donate For Us