I have a datetime
index DataFrame of pandas like this:
A B C A_1 B_1
2017-07-01 00:00:00 1 34 e 9 0
2017-07-01 00:05:00 2 34 e 92 2
2017-07-01 00:10:00 3 34 e 23 3
2017-07-01 00:15:00 4 34 e 2 5
2017-07-01 00:20:00 5 34 e 4 3
I want to split it and concat with axis=0
, the result like this
C REQ _1
2017-07-01 00:00:00 e 1 9
2017-07-01 00:05:00 e 2 92
2017-07-01 00:10:00 e 3 23
2017-07-01 00:15:00 e 4 2
2017-07-01 00:20:00 e 5 4
2017-07-01 00:00:00 e 34 0
2017-07-01 00:05:00 e 34 2
2017-07-01 00:10:00 e 34 3
2017-07-01 00:15:00 e 34 5
2017-07-01 00:20:00 e 34 3
So, I have to do it like this:
First, select df[['C','A','A_1']]
, df[['C','B', 'B_1']]
. Then map the columns, and concat the result.
It's complicated,is there any built-in method in pandas to do this? Or any faster method? Because I have thousands of columns to concat with to get the final result.
Use pandas. concat() to concatenate/merge two or multiple pandas DataFrames across rows or columns. When you concat() two pandas DataFrames on rows, it creates a new Dataframe containing all rows of two DataFrames basically it does append one DataFrame with another.
It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.
Sometimes in order to analyze the Dataframe more accurately, we need to split it into 2 or more parts. The Pandas provide the feature to split Dataframe according to column index, row index, and column values, etc.
After doing some research lreshape
is not documented well and pd.wide_to_long
, which is in the current API, does the same as lreshape with more flexibility.
https://github.com/pandas-dev/pandas/issues/2567
https://github.com/pandas-dev/pandas/issues/15003
Let's use the API documented method:
dict1 = {'A':'REQ_A1','B':'REQ_B1','A_1':'Value_A1','B_1':'Value_B1'}
df2 = df1.rename(columns=dict1)
(pd.wide_to_long(df2.reset_index(),['REQ','Value'],i='index',j='C',sep='_',suffix='.')
.rename_axis(['index','dropme'])
.reset_index()
.drop('dropme', axis=1)
.rename(columns={'Value':'_1'}))
Output:
index C REQ _1
0 2017-07-01 00:00:00 e 1 9
1 2017-07-01 00:05:00 e 2 92
2 2017-07-01 00:10:00 e 3 23
3 2017-07-01 00:15:00 e 4 2
4 2017-07-01 00:20:00 e 5 4
5 2017-07-01 00:00:00 e 34 0
6 2017-07-01 00:05:00 e 34 2
7 2017-07-01 00:10:00 e 34 3
8 2017-07-01 00:15:00 e 34 5
9 2017-07-01 00:20:00 e 34 3
Use pd.lreshape
:
d = {'REQ': ['A', 'B'], '_1': ['A_1', 'B_1']}
df_out = (pd.lreshape(df.reset_index(), d).set_index('index'))
Output:
C REQ _1
index
2017-07-01 00:00:00 e 1 9
2017-07-01 00:05:00 e 2 92
2017-07-01 00:10:00 e 3 23
2017-07-01 00:15:00 e 4 2
2017-07-01 00:20:00 e 5 4
2017-07-01 00:00:00 e 34 0
2017-07-01 00:05:00 e 34 2
2017-07-01 00:10:00 e 34 3
2017-07-01 00:15:00 e 34 5
2017-07-01 00:20:00 e 34 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With