I have a pandas dataframe as
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
df
A B C D
E -0.585995 1.325598 -1.172405 -2.810322
F -2.282079 -1.203231 -0.304155 -0.119221
G -0.739126 1.114628 0.381701 -0.485394
H 1.162010 -1.472594 1.767941 1.450582
I 0.119481 0.097139 -0.091432 -0.415333
J 1.266389 0.875473 1.787459 -1.149971
How can I flatten this array, whilst keeping the column and index IDs as here:
E A -0.585995
E B 1.325598
E C -1.172405
E D -2.810322
F A ...
F B ...
...
...
J D -1.149971
It doesnt matter what order the values occur in...
np.flatten() can be used to flatten the df.values into a 1D array, but then I lose the order of the index and columns...
The first method to flatten the pandas dataframe is through NumPy python package. There is a function in NumPy that is numpy. flatten() that perform this task. First, you have to convert the dataframe to numpy using the to_numpy() method and then apply the flatten() method.
The two major sort functions You can check the API for sort_values and sort_index at the Pandas documentation for details on the parameters. sort_values() : You use this to sort the Pandas DataFrame by one or more columns. sort_index() : You use this to sort the Pandas DataFrame by the row index.
Flatten columns: use get_level_values() Flatten columns: use to_flat_index() Flatten columns: join column labels. Flatten rows: flatten all levels.
Use stack
+ set_index
:
df = df.stack().reset_index()
df.columns = ['a','b','c']
print (df)
a b c
0 E A -0.585995
1 E B 1.325598
2 E C -1.172405
3 E D -2.810322
4 F A -2.282079
5 F B -1.203231
6 F C -0.304155
7 F D -0.119221
8 G A -0.739126
9 G B 1.114628
10 G C 0.381701
11 G D -0.485394
12 H A 1.162010
13 H B -1.472594
14 H C 1.767941
15 H D 1.450582
16 I A 0.119481
17 I B 0.097139
18 I C -0.091432
19 I D -0.415333
20 J A 1.266389
21 J B 0.875473
22 J C 1.787459
23 J D -1.149971
Numpy solution with numpy.tile
+ numpy.repeat
+ numpy.ravel
:
b = np.tile(df.columns, len(df.index))
a = np.repeat(df.index, len(df.columns))
c = df.values.ravel()
df = pd.DataFrame({'a':a, 'b':b, 'c':c})
print (df)
a b c
0 E A -0.585995
1 E B 1.325598
2 E C -1.172405
3 E D -2.810322
4 F A -2.282079
5 F B -1.203231
6 F C -0.304155
7 F D -0.119221
8 G A -0.739126
9 G B 1.114628
10 G C 0.381701
11 G D -0.485394
12 H A 1.162010
13 H B -1.472594
14 H C 1.767941
15 H D 1.450582
16 I A 0.119481
17 I B 0.097139
18 I C -0.091432
19 I D -0.415333
20 J A 1.266389
21 J B 0.875473
22 J C 1.787459
23 J D -1.149971
Timings:
In [103]: %timeit (df.stack().reset_index())
1000 loops, best of 3: 1.26 ms per loop
In [104]: %timeit (pd.DataFrame({'a':np.repeat(df.index, len(df.columns)), 'b':np.tile(df.columns, len(df.index)), 'c':df.values.ravel()}))
1000 loops, best of 3: 436 µs per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With