Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I flatten a pandas dataframe keeping index and column names

Tags:

python

pandas

I have a pandas dataframe as

df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
df
     A          B           C            D
E   -0.585995   1.325598    -1.172405   -2.810322
F   -2.282079   -1.203231   -0.304155   -0.119221
G   -0.739126   1.114628    0.381701    -0.485394
H   1.162010    -1.472594   1.767941    1.450582
I   0.119481    0.097139    -0.091432   -0.415333
J   1.266389    0.875473    1.787459    -1.149971

How can I flatten this array, whilst keeping the column and index IDs as here:

E A -0.585995
E B 1.325598
E C -1.172405
E D -2.810322
F A ...
F B ...
...
...
J D -1.149971

It doesnt matter what order the values occur in...

np.flatten() can be used to flatten the df.values into a 1D array, but then I lose the order of the index and columns...

like image 486
JoshuaBox Avatar asked May 09 '17 16:05

JoshuaBox


People also ask

How do I flatten Pandas DataFrame?

The first method to flatten the pandas dataframe is through NumPy python package. There is a function in NumPy that is numpy. flatten() that perform this task. First, you have to convert the dataframe to numpy using the to_numpy() method and then apply the flatten() method.

What is the difference between Sort_values () and Sort_index () method?

The two major sort functions You can check the API for sort_values and sort_index at the Pandas documentation for details on the parameters. sort_values() : You use this to sort the Pandas DataFrame by one or more columns. sort_index() : You use this to sort the Pandas DataFrame by the row index.

How do I flatten a multi level column in Pandas?

Flatten columns: use get_level_values() Flatten columns: use to_flat_index() Flatten columns: join column labels. Flatten rows: flatten all levels.


1 Answers

Use stack + set_index:

df = df.stack().reset_index()
df.columns = ['a','b','c']
print (df)
    a  b         c
0   E  A -0.585995
1   E  B  1.325598
2   E  C -1.172405
3   E  D -2.810322
4   F  A -2.282079
5   F  B -1.203231
6   F  C -0.304155
7   F  D -0.119221
8   G  A -0.739126
9   G  B  1.114628
10  G  C  0.381701
11  G  D -0.485394
12  H  A  1.162010
13  H  B -1.472594
14  H  C  1.767941
15  H  D  1.450582
16  I  A  0.119481
17  I  B  0.097139
18  I  C -0.091432
19  I  D -0.415333
20  J  A  1.266389
21  J  B  0.875473
22  J  C  1.787459
23  J  D -1.149971

Numpy solution with numpy.tile + numpy.repeat + numpy.ravel:

b = np.tile(df.columns, len(df.index))
a = np.repeat(df.index, len(df.columns))
c = df.values.ravel()

df = pd.DataFrame({'a':a, 'b':b, 'c':c})
print (df)
    a  b         c
0   E  A -0.585995
1   E  B  1.325598
2   E  C -1.172405
3   E  D -2.810322
4   F  A -2.282079
5   F  B -1.203231
6   F  C -0.304155
7   F  D -0.119221
8   G  A -0.739126
9   G  B  1.114628
10  G  C  0.381701
11  G  D -0.485394
12  H  A  1.162010
13  H  B -1.472594
14  H  C  1.767941
15  H  D  1.450582
16  I  A  0.119481
17  I  B  0.097139
18  I  C -0.091432
19  I  D -0.415333
20  J  A  1.266389
21  J  B  0.875473
22  J  C  1.787459
23  J  D -1.149971

Timings:

In [103]: %timeit (df.stack().reset_index())
1000 loops, best of 3: 1.26 ms per loop

In [104]: %timeit (pd.DataFrame({'a':np.repeat(df.index, len(df.columns)), 'b':np.tile(df.columns, len(df.index)), 'c':df.values.ravel()}))
1000 loops, best of 3: 436 µs per loop
like image 128
jezrael Avatar answered Sep 20 '22 22:09

jezrael