Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to split and concat pandas dataframe

I have a datetime index DataFrame of pandas like this:

                         A   B  C  A_1  B_1
2017-07-01 00:00:00  1  34  e    9    0
2017-07-01 00:05:00  2  34  e   92    2
2017-07-01 00:10:00  3  34  e   23    3
2017-07-01 00:15:00  4  34  e    2    5
2017-07-01 00:20:00  5  34  e    4    3

I want to split it and concat with axis=0, the result like this

                     C  REQ  _1
2017-07-01 00:00:00  e  1    9
2017-07-01 00:05:00  e  2   92
2017-07-01 00:10:00  e  3   23
2017-07-01 00:15:00  e  4    2
2017-07-01 00:20:00  e  5    4
2017-07-01 00:00:00  e  34    0
2017-07-01 00:05:00  e  34    2
2017-07-01 00:10:00  e  34    3
2017-07-01 00:15:00  e  34    5
2017-07-01 00:20:00  e  34    3

So, I have to do it like this: First, select df[['C','A','A_1']], df[['C','B', 'B_1']]. Then map the columns, and concat the result.

It's complicated,is there any built-in method in pandas to do this? Or any faster method? Because I have thousands of columns to concat with to get the final result.

like image 390
J.Dan Avatar asked Aug 18 '17 03:08

J.Dan


People also ask

How do I concatenate DataFrame in pandas?

Use pandas. concat() to concatenate/merge two or multiple pandas DataFrames across rows or columns. When you concat() two pandas DataFrames on rows, it creates a new Dataframe containing all rows of two DataFrames basically it does append one DataFrame with another.

How do you concatenate two Dataframes with different columns in pandas?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.

Can you split DataFrame in Python?

Sometimes in order to analyze the Dataframe more accurately, we need to split it into 2 or more parts. The Pandas provide the feature to split Dataframe according to column index, row index, and column values, etc.


1 Answers

EDIT

After doing some research lreshape is not documented well and pd.wide_to_long, which is in the current API, does the same as lreshape with more flexibility.

https://github.com/pandas-dev/pandas/issues/2567

https://github.com/pandas-dev/pandas/issues/15003

Let's use the API documented method:

dict1 = {'A':'REQ_A1','B':'REQ_B1','A_1':'Value_A1','B_1':'Value_B1'}

df2 = df1.rename(columns=dict1)

(pd.wide_to_long(df2.reset_index(),['REQ','Value'],i='index',j='C',sep='_',suffix='.')
  .rename_axis(['index','dropme'])
  .reset_index()
  .drop('dropme', axis=1)
  .rename(columns={'Value':'_1'}))

Output:

                 index  C  REQ  _1
0  2017-07-01 00:00:00  e    1   9
1  2017-07-01 00:05:00  e    2  92
2  2017-07-01 00:10:00  e    3  23
3  2017-07-01 00:15:00  e    4   2
4  2017-07-01 00:20:00  e    5   4
5  2017-07-01 00:00:00  e   34   0
6  2017-07-01 00:05:00  e   34   2
7  2017-07-01 00:10:00  e   34   3
8  2017-07-01 00:15:00  e   34   5
9  2017-07-01 00:20:00  e   34   3

Use pd.lreshape:

d = {'REQ': ['A', 'B'], '_1': ['A_1', 'B_1']}
df_out = (pd.lreshape(df.reset_index(), d).set_index('index'))

Output:

                     C  REQ  _1
index                          
2017-07-01 00:00:00  e    1   9
2017-07-01 00:05:00  e    2  92
2017-07-01 00:10:00  e    3  23
2017-07-01 00:15:00  e    4   2
2017-07-01 00:20:00  e    5   4
2017-07-01 00:00:00  e   34   0
2017-07-01 00:05:00  e   34   2
2017-07-01 00:10:00  e   34   3
2017-07-01 00:15:00  e   34   5
2017-07-01 00:20:00  e   34   3
like image 88
Scott Boston Avatar answered Oct 07 '22 01:10

Scott Boston