I know the question name is a little ambiguous.
My goal is to assign global key column based on 2 columns + unique value in my data frame.
For example
CountryCode | Accident
AFG Car
AFG Bike
AFG Car
AFG Plane
USA Car
USA Bike
UK Car
Let Car = 01, Bike = 02, Plane = 03
My desire global key format is [Accident][CountryCode][UniqueValue]
Unique value is a count of similar [Accident][CountryCode]
so if Accident = Car and CountryCode = AFG and it is the first occurrence, the global key would be 01AFG01
The desired dataframe would look like this:
CountryCode | Accident | GlobalKey
AFG Car 01AFG01
AFG Bike 02AFG01
AFG Car 01AFG02
AFG Plane 01AFG03
USA Car 01USA01
USA Bike 01USA02
UK Car 01UK01
I have tried running a for loop to append Accident Number and CountryCode together
for example:
globalKey = []
for x in range(0,6):
string = df.iloc[x, 1]
string2 = df.iloc[x, 2]
if string2 == 'Car':
number = '01'
elif string2 == 'Bike':
number = '02'
elif string2 == 'Plane':
number = '03'
#Concat the number of accident and Country Code
subKey = number + string
#Append to the list
globalKey.append(subKey)
This code will provide me with something like 01AFG, 02AFG based on the value I assign. but I want to assign a unique value by counting the occurrence of when CountryCode and Accident is similar.
I am stuck with the code above. I think there should be a better way to do it using map function in Pandas.
Thanks for helping guys! much appreciate!
You can try with cumcount to achieve this in a number of steps, like this:
In [1]: df = pd.DataFrame({'Country':['AFG','AFG','AFG','AFG','USA','USA','UK'], 'Accident':['Car','Bike','Car','Plane','Car','Bike','Car']})
In [2]: df
Out[2]:
Accident Country
0 Car AFG
1 Bike AFG
2 Car AFG
3 Plane AFG
4 Car USA
5 Bike USA
6 Car UK
## Create a column to keep incremental values for `Country`
In [3]: df['cumcount'] = df.groupby('Country').cumcount()
In [4]: df
Out[4]:
Accident Country cumcount
0 Car AFG 0
1 Bike AFG 1
2 Car AFG 2
3 Plane AFG 3
4 Car USA 0
5 Bike USA 1
6 Car UK 0
## Create a column to keep incremental values for combination of `Country`,`Accident`
In [5]: df['cumcount_type'] = df.groupby(['Country','Accident']).cumcount()
In [6]: df
Out[6]:
Accident Country cumcount cumcount_type
0 Car AFG 0 0
1 Bike AFG 1 0
2 Car AFG 2 1
3 Plane AFG 3 0
4 Car USA 0 0
5 Bike USA 1 0
6 Car UK 0 0
And from that point on you can concatenate the values of cumcount, cumcount_type and Country to achieve what you're after.
Maybe you want to add 1 to each of the values you have under the different counts, depending on whether you want to start counting from 0 or 1.
I hope this helps.
First of all, don't use for loops if you can help it. For example, you can do your Accident to code mapping with:
df['AccidentCode'] = df['Accident'].map({'Car': '01', 'Bike': '02', 'Plane': '03'})
To get the unique code, Thanos has shown how to do that using GroupBy.cumcount:
df['CA_ID'] = df.groupby(['CountryCode', 'Accident']).cumcount() + 1
And then to put them all together into a unique key:
df['NewKey'] = df['AccidentCode'] + df['CountryCode'] + df['CA_ID'].map('{:0>2}'.format)
which gives:
CountryCode Accident GlobalKey AccidentCode CA_ID NewKey
0 AFG Car 01AFG01 01 1 01AFG01
1 AFG Bike 02AFG01 02 1 02AFG01
2 AFG Car 01AFG02 01 2 01AFG02
3 AFG Plane 01AFG03 03 1 03AFG01
4 USA Car 01USA01 01 1 01USA01
5 USA Bike 01USA02 02 1 02USA01
6 UK Car 01UK01 01 1 01UK01
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With