My DataFrame has a string in the first column, and a number in the second one:
GEOSTRING IDactivity
9 wydm2p01uk0fd2z 2
10 wydm86pg6r3jyrg 2
11 wydm2p01uk0fd2z 2
12 wydm80xfxm9j22v 2
39 wydm9w92j538xze 4
40 wydm8km72gbyuvf 4
41 wydm86pg6r3jyrg 4
42 wydm8mzt874p1v5 4
43 wydm8mzmpz5gkt8 5
44 wydm86pg6r3jyrg 5
45 wydm8w1q8bjfpcj 5
46 wydm8w1q8bjfpcj 5
What I want to do is to manipulate this DataFrame in order to have a list object that contains a string, made out of the 5th character for each "GEOSTRING" value, for each different "IDactivity" value. So in this case, I have 3 different "IDactivity" values, and I will have in my list object 3 strings that look like this:
['2828', '9888','8888']
where again, the symbols you see in each string, are the 5th value of each "GEOSTRING" value.
What I'm asking is a solution, or an approach, that doesn't involve a too complicated for
loop and have it as efficient as possible since I have to manipulate lots of data. I'd like it to be clean and fast.
I hope it's clear enough.
this can be done easily as follows as a one liner: (considered to be pretty fast too)
result = df.groupby('IDactivity')['GEOSTRING'].apply(lambda x:''.join(x.str[4])).tolist()
this groups the dataframe by values of IDactivity
then select from each corresponding string of GEOSTRING
column the 5th element (index 4) and joins it with the other corresponding strings. Finally we add tolist()
method to get the output as list not pandas Series.
output:
['2828', '9888', '8888']
Documentation:
pandas.groupby
pandas.apply
Here's a solution involving a temp column, and taking inspiration for the key operation from this answer:
# create a temp column with the character we want from each string
dframe['Temp'] = dframe['GEOSTRING'].apply(lambda x: x[4])
# groupby ID and then concatenate using a sneaky call to .sum()
dframe.groupby('IDactivity')['Temp'].sum().tolist()
Result:
['2828', '9888', '8888']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With