Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to {pivot|denormalize|manipulate} CSV table in Python

Sorry for the messy title, I didn't know how to phrase this question well.

Let's say I have a table in which the first three columns are foo bar and baz. Then there are some number of arbitrary columns after. I want to manipulate the table such that these arbitrary columns are all collapsed under on column, called num.

An example makes this clearer

foo, bar, baz, 100, 101, 102, 103, 104,
1,   1,   1,  10,  11,  12,  13,  14,
1,   1,   2,  15,  16,  17,  18,  19,
1,   2,   1,  20,  21,  22,  23,  24,

This input should transpose to:

num, foo, bar, baz, value,
100,   1,    1,  1,    10,
100,   1,    1,  2,    15,
100,   1,    2,  1,    20,
101,   1,    1,  1,    11,
101,   1,    1,  2,    16,
101,   1,    2,  1,    21,
102,   1,    1,  1,    12,
102,   1,    1,  2,    17,
102,   1,    2,  1,    22,
...

I wrote a custom Python solution that does this. It wasn't hard, but I feel like this is reinventing the wheel.

Is there a way to accomplish this with a library like pandas or some other table manipulation module?

like image 451
Noah Eisen Avatar asked Jun 27 '18 17:06

Noah Eisen


2 Answers

Using pandas.melt:

res = pd.melt(df, id_vars=['foo', 'bar', 'baz'])

print(res)

    foo  bar  baz variable  value
0     1    1    1      100     10
1     1    1    2      100     15
2     1    2    1      100     20
3     1    1    1      101     11
4     1    1    2      101     16
5     1    2    1      101     21
6     1    1    1      102     12
7     1    1    2      102     17
8     1    2    1      102     22
9     1    1    1      103     13
10    1    1    2      103     18
11    1    2    1      103     23
12    1    1    1      104     14
13    1    1    2      104     19
14    1    2    1      104     24
like image 82
jpp Avatar answered Sep 20 '22 14:09

jpp


cols = np.concatenate([['num'], df.columns[:3], ['value']])
pd.DataFrame([
    [a, b, c, d, e]
    for b, c, d, *x in df.values
    for a, e in zip(df.columns[3:], x)
]).set_axis(cols, axis=1, inplace=False).sort_values('num')

    num  foo  bar  baz  value
0   100    1    1    1     10
5   100    1    1    2     15
10  100    1    2    1     20
1   101    1    1    1     11
6   101    1    1    2     16
11  101    1    2    1     21
2   102    1    1    1     12
7   102    1    1    2     17
12  102    1    2    1     22
3   103    1    1    1     13
8   103    1    1    2     18
13  103    1    2    1     23
4   104    1    1    1     14
9   104    1    1    2     19
14  104    1    2    1     24
like image 31
piRSquared Avatar answered Sep 16 '22 14:09

piRSquared