Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping columns by unique values in Python

I have a data set with two columns and I need to change it from this format:

10  1 
10  5
10  3
11  5
11  4
12  6
12  2

to this

10  1  5  3
11  5  4
12  6  2

I need every unique value in the first column to be on its own row.

I am a beginner with Python and beyond reading in my text file, I'm at a loss for how to proceed.


2 Answers

You can use Pandas dataframes.

import pandas as pd

df = pd.DataFrame({'A':[10,10,10,11,11,12,12],'B':[1,5,3,5,4,6,2]})
print(df)

Output:

    A  B
0  10  1
1  10  5
2  10  3
3  11  5
4  11  4
5  12  6
6  12  2

Let's use groupby and join:

df.groupby('A')['B'].apply(lambda x:' '.join(x.astype(str)))

Output:

A
10    1 5 3
11      5 4
12      6 2
Name: B, dtype: object
like image 156
Scott Boston Avatar answered Dec 02 '25 19:12

Scott Boston


Using collections.defaultdict subclass:

import collections
with open('yourfile.txt', 'r') as f:
    d = collections.defaultdict(list)
    for k,v in (l.split() for l in f.read().splitlines()):  # processing each line
        d[k].append(v)             # accumulating values for the same 1st column
    for k,v in sorted(d.items()):  # outputting grouped sequences
        print('%s  %s' % (k,'  '.join(v)))

The output:

10  1  5  3
11  5  4
12  6  2
like image 34
RomanPerekhrest Avatar answered Dec 02 '25 19:12

RomanPerekhrest