Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a DataFrame while preserving order of the columns?

Tags:

python

pandas

How can I create a DataFrame from multiple numpy arrays, Pandas Series, or Pandas DataFrame's while preserving the order of the columns?

For example, I have these two numpy arrays and I want to combine them as a Pandas DataFrame.

foo = np.array( [ 1, 2, 3 ] )
bar = np.array( [ 4, 5, 6 ] )

If I do this, the bar column would come first because dict doesn't preserve order.

pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } )

    bar foo
0   4   1
1   5   2
2   6   3

I can do this, but it gets tedious when I need to combine many variables.

pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) }, columns = [ 'foo', 'bar' ] )

EDIT: Is there a way to specify the variables to be joined and to organize the column order in one operation? That is, I don't mind using multiple lines to complete the entire operation, but I'd rather not having to specify the variables to be joined multiple times (since I will be changing the code a lot and this is pretty error prone).

EDIT2: One more point. If I want to add or remove one of the variables to be joined, I only want to add/remove in one place.

like image 230
ceiling cat Avatar asked Apr 11 '16 03:04

ceiling cat


People also ask

Does pandas preserve column order?

Pandas. DataFrame doesn't preserve the column order when converting from a DataFrames.

Does DataFrame preserve order?

Answer. Yes, by default, concatenating dataframes will preserve their row order. The order of the dataframes to concatenate will be the order of the result dataframe.

How do I rearrange the order of columns in pandas?

Reorder Columns using Pandas . Another way to reorder columns is to use the Pandas . reindex() method. This allows you to pass in the columns= parameter to pass in the order of columns that you want to use.

How do I arrange DataFrame columns?

You need to create a new list of your columns in the desired order, then use df = df[cols] to rearrange the columns in this new order.


3 Answers

Original Solution: Incorrect Usage of collections.OrderedDict

In my original solution, I proposed to use OrderedDict from the collections package in python's standard library.

>>> import numpy as np >>> import pandas as pd >>> from collections import OrderedDict >>> >>> foo = np.array( [ 1, 2, 3 ] ) >>> bar = np.array( [ 4, 5, 6 ] ) >>> >>> pd.DataFrame( OrderedDict( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } ) )     foo  bar 0    1    4 1    2    5 2    3    6 

Right Solution: Passing Key-Value Tuple Pairs for Order Preservation

However, as noted, if a normal dictionary is passed to OrderedDict, the order may still not be preserved since the order is randomized when constructing the dictionary. However, a work around is to convert a list of key-value tuple pairs into an OrderedDict, as suggested from this SO post:

>>> import numpy as np >>> import pandas as pd >>> from collections import OrderedDict >>> >>> a = np.array( [ 1, 2, 3 ] ) >>> b = np.array( [ 4, 5, 6 ] ) >>> c = np.array( [ 7, 8, 9 ] ) >>> >>> pd.DataFrame( OrderedDict( { 'a': pd.Series(a), 'b': pd.Series(b), 'c': pd.Series(c) } ) )     a  c  b 0  1  7  4 1  2  8  5 2  3  9  6  >>> pd.DataFrame( OrderedDict( (('a', pd.Series(a)), ('b', pd.Series(b)), ('c', pd.Series(c))) ) )     a  b  c 0  1  4  7 1  2  5  8 2  3  6  9 
like image 167
Eddo Hintoso Avatar answered Oct 09 '22 21:10

Eddo Hintoso


Use the columns keyword when creating the DataFrame:

pd.DataFrame({'foo': foo, 'bar': bar}, columns=['foo', 'bar']) 

Also, note that you don't need to create the Series.

like image 23
blokeley Avatar answered Oct 09 '22 20:10

blokeley


To preserve column order pass in your numpy arrays as a list of tuples to DataFrame.from_items:

>>> df = pd.DataFrame.from_items([('foo', foo), ('bar', bar)])

   foo  bar
0    1    4
1    2    5
2    3    6

Update

From pandas 0.23 from_items is deprecated and will be removed. So pass the numpy arrays using from_dict. To use from_dict you need to pass the items as a dictionary:

>>> from collections import OrderedDict as OrderedDict
>>> df = pd.DataFrame.from_dict(OrderedDict(zip(['foo', 'bar'], [foo, bar])))

From python 3.7 you can depend on insertion order being preserved (see https://mail.python.org/pipermail/python-dev/2017-December/151283.html) so:

>>> df = pd.DataFrame.from_dict(dict(zip(['foo', 'bar'], [foo, bar])))

or simply:

>>> df = pd.DataFrame(dict(zip(['foo', 'bar'], [foo, bar])))
like image 30
Vidhya G Avatar answered Oct 09 '22 21:10

Vidhya G