I need to create a python list object, or any object, out of a pandas DataFrame object grouping pieces of values from different rows

Question

My DataFrame has a string in the first column, and a number in the second one:

            GEOSTRING  IDactivity
9     wydm2p01uk0fd2z           2
10    wydm86pg6r3jyrg           2
11    wydm2p01uk0fd2z           2
12    wydm80xfxm9j22v           2
39    wydm9w92j538xze           4
40    wydm8km72gbyuvf           4
41    wydm86pg6r3jyrg           4
42    wydm8mzt874p1v5           4
43    wydm8mzmpz5gkt8           5
44    wydm86pg6r3jyrg           5
45    wydm8w1q8bjfpcj           5
46    wydm8w1q8bjfpcj           5

What I want to do is to manipulate this DataFrame in order to have a list object that contains a string, made out of the 5th character for each "GEOSTRING" value, for each different "IDactivity" value. So in this case, I have 3 different "IDactivity" values, and I will have in my list object 3 strings that look like this:

['2828', '9888','8888']

where again, the symbols you see in each string, are the 5th value of each "GEOSTRING" value.

What I'm asking is a solution, or an approach, that doesn't involve a too complicated for loop and have it as efficient as possible since I have to manipulate lots of data. I'd like it to be clean and fast.

I hope it's clear enough.

Rayhane Mama · Accepted Answer

this can be done easily as follows as a one liner: (considered to be pretty fast too)

result = df.groupby('IDactivity')['GEOSTRING'].apply(lambda x:''.join(x.str[4])).tolist()

this groups the dataframe by values of IDactivity then select from each corresponding string of GEOSTRING column the 5th element (index 4) and joins it with the other corresponding strings. Finally we add tolist() method to get the output as list not pandas Series.

output:

['2828', '9888', '8888']

Documentation:

pandas.groupby
pandas.apply

cmaher · Answer

Here's a solution involving a temp column, and taking inspiration for the key operation from this answer:

# create a temp column with the character we want from each string
dframe['Temp'] = dframe['GEOSTRING'].apply(lambda x: x[4])

# groupby ID and then concatenate using a sneaky call to .sum()
dframe.groupby('IDactivity')['Temp'].sum().tolist()

Result:

['2828', '9888', '8888']

I need to create a python list object, or any object, out of a pandas DataFrame object grouping pieces of values from different rows

Tags:

python

list

pandas

dataframe

zampero

2 Answers

Rayhane Mama

cmaher

Recent Activity

Donate For Us

I need to create a python list object, or any object, out of a pandas DataFrame object grouping pieces of values from different rows

Tags:

python

list

pandas

dataframe

zampero

2 Answers

Rayhane Mama

cmaher

Related questions

Recent Activity

Donate For Us