Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to iterate over pandas DataFrameGroupBy and select all entries per grouped variable for specific column?

Let's assume, there is a table like this:

Id | Type | Guid

I perform on such a table the following operation:

df = df.groupby('Id')

Now I would like to iterate through first n rows and for each specific Id as a list print all the corresponding entries from column Guid. Please, help me with a solution.

like image 305
Server Khalilov Avatar asked May 06 '17 23:05

Server Khalilov


People also ask

How do I iterate over a group in pandas DataFrame?

groupby() to Iterate over Data frame Groups. DataFrame. groupby() function in Python is used to split the data into groups based on some criteria.

How do I select only certain columns in pandas?

To select a single column, use square brackets [] with the column name of the column of interest.

How do I iterate through a pandas column?

One simple way to iterate over columns of pandas DataFrame is by using for loop. You can use column-labels to run the for loop over the pandas DataFrame using the get item syntax ([]) . Yields below output. The values() function is used to extract the object elements as a list.

What is the fastest way to iterate over pandas DataFrame?

Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.


1 Answers

I think I would do it like this:

Create some data for testing

df = pd.DataFrame({'Id':np.random.randint(1,10,100),'Type':np.random.choice(list('ABCD'),100),'Guid':np.random.randint(10000,99999,100)})

print(df.head()
   Id Type   Guid
0   2    A  89247
1   4    B  39262
2   3    C  45522
3   1    B  99724
4   4    C  51322

Choose n for number of records to return and groupby

n = 5
df_groups = df.groupby('Id')

Iterate through df_group with for loop and print

for name,group in df_groups:
    print('ID: ' + str(name))
    print(group.head(n))
    print("\n")

Output:

ID: 1
    Id Type   Guid
3    1    B  99724
5    1    B  74182
37   1    D  49219
47   1    B  81464
65   1    C  84925


ID: 2
    Id Type   Guid
0    2    A  89247
6    2    A  16499
7    2    A  79956
34   2    C  56393
40   2    A  49883
.
.
.

EDIT To print all the Guids in a list for each ID you can use the following:

for name,group in df_groups:
    print('ID: ' + str(name))
    print(group.Guid.tolist())
    print("\n")

Output:

ID: 1
[99724, 74182, 49219, 81464, 84925, 67834, 43275, 35743, 36478, 94662, 21183]


ID: 2
[89247, 16499, 79956, 56393, 49883, 97633, 11768, 14639, 88591, 31263, 98729]


ID: 3
[45522, 13971, 75882, 96489, 58414, 22051, 80304, 46144, 22481, 11278, 84622, 61145]


ID: 4
[39262, 51322, 76930, 83740, 60152, 90735, 42039, 22114, 76077, 83234, 96134, 93559, 87903, 98199, 76096, 64378]


ID: 5
[13444, 55762, 13206, 94768, 19665, 75761, 90755, 45737, 23506, 89345, 94912, 81200, 91868]
.
.
.
like image 149
Scott Boston Avatar answered Sep 28 '22 17:09

Scott Boston