Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Column order in pandas.concat

I do as below:

data1 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
data2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
frames = [data1, data2]
data = pd.concat(frames)
data


   a    b
0   2   1
1   2   1
2   2   1
0   2   1
1   2   1
2   2   1

The data column order is in alphabet order. Why is it so? and how to keep the original order?

like image 975
Edward Avatar asked Aug 19 '16 20:08

Edward


People also ask

Does pandas concat keep order?

Yes, by default, concatenating dataframes will preserve their row order.

How do I rearrange the order of columns in pandas?

Reorder Columns using Pandas . Another way to reorder columns is to use the Pandas . reindex() method. This allows you to pass in the columns= parameter to pass in the order of columns that you want to use.

How do I concatenate columns in pandas?

By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.

Is pandas concat fast?

In this benchmark, concatenating multiple dataframes by using the Pandas. concat function is 50 times faster than using the DataFrame. append version. With multiple append , a new DataFrame is created at each iteration, and the underlying data is copied each time.


4 Answers

You are creating DataFrames out of dictionaries. Dictionaries are a unordered which means the keys do not have a specific order. So

d1 = {'key_a': 'val_a', 'key_b': 'val_b'}

and

d2 = {'key_b': 'val_b', 'key_a': 'val_a'}

are (probably) the same.

In addition to that I assume that pandas sorts the dictionary's keys descending by default (unfortunately I did not find any hint in the docs in order to prove that assumption) leading to the behavior you encountered.

So the basic motivation would be to resort / reorder the columns in your DataFrame. You can do this as follows:

import pandas as pd

data1 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
data2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
frames = [data1, data2]
data = pd.concat(frames)

print(data)

cols = ['b' , 'a']
data = data[cols]

print(data)
like image 67
albert Avatar answered Oct 16 '22 08:10

albert


Starting from version 0.23.0, you can prevent the concat() method to sort the returned DataFrame. For example:

df1 = pd.DataFrame({ 'a' : [1, 1, 1], 'b' : [2, 2, 2]})
df2 = pd.DataFrame({ 'b' : [1, 1, 1], 'a' : [2, 2, 2]})
df = pd.concat([df1, df2], sort=False)

A future version of pandas will change to not sort by default.

like image 31
Michael H. Avatar answered Oct 16 '22 08:10

Michael H.


def concat_ordered_columns(frames):
    columns_ordered = []
    for frame in frames:
        columns_ordered.extend(x for x in frame.columns if x not in columns_ordered)
    final_df = pd.concat(frames)    
    return final_df[columns_ordered]       

# Usage
dfs = [df_a,df_b,df_c]
full_df = concat_ordered_columns(dfs)

This should work.

like image 5
Philip Zelitchenko Avatar answered Oct 16 '22 10:10

Philip Zelitchenko


You can create the original DataFrames with OrderedDicts

from collections import OrderedDict

odict = OrderedDict()
odict['b'] = [1, 1, 1]
odict['a'] = [2, 2, 2]
data1 = pd.DataFrame(odict)
data2 = pd.DataFrame(odict)
frames = [data1, data2]
data = pd.concat(frames)
data


    b    a
0   1    2
1   1    2
2   1    2
0   1    2
1   1    2
2   1    2
like image 2
mohrtw Avatar answered Oct 16 '22 08:10

mohrtw