How can I create a DataFrame from multiple numpy
arrays, Pandas
Series, or Pandas
DataFrame's while preserving the order of the columns?
For example, I have these two numpy
arrays and I want to combine them as a Pandas
DataFrame.
foo = np.array( [ 1, 2, 3 ] )
bar = np.array( [ 4, 5, 6 ] )
If I do this, the bar
column would come first because dict
doesn't preserve order.
pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } )
bar foo
0 4 1
1 5 2
2 6 3
I can do this, but it gets tedious when I need to combine many variables.
pd.DataFrame( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) }, columns = [ 'foo', 'bar' ] )
EDIT: Is there a way to specify the variables to be joined and to organize the column order in one operation? That is, I don't mind using multiple lines to complete the entire operation, but I'd rather not having to specify the variables to be joined multiple times (since I will be changing the code a lot and this is pretty error prone).
EDIT2: One more point. If I want to add or remove one of the variables to be joined, I only want to add/remove in one place.
Pandas. DataFrame doesn't preserve the column order when converting from a DataFrames.
Answer. Yes, by default, concatenating dataframes will preserve their row order. The order of the dataframes to concatenate will be the order of the result dataframe.
Reorder Columns using Pandas . Another way to reorder columns is to use the Pandas . reindex() method. This allows you to pass in the columns= parameter to pass in the order of columns that you want to use.
You need to create a new list of your columns in the desired order, then use df = df[cols] to rearrange the columns in this new order.
collections.OrderedDict
In my original solution, I proposed to use OrderedDict
from the collections
package in python's standard library.
>>> import numpy as np >>> import pandas as pd >>> from collections import OrderedDict >>> >>> foo = np.array( [ 1, 2, 3 ] ) >>> bar = np.array( [ 4, 5, 6 ] ) >>> >>> pd.DataFrame( OrderedDict( { 'foo': pd.Series(foo), 'bar': pd.Series(bar) } ) ) foo bar 0 1 4 1 2 5 2 3 6
However, as noted, if a normal dictionary is passed to OrderedDict
, the order may still not be preserved since the order is randomized when constructing the dictionary. However, a work around is to convert a list of key-value tuple pairs into an OrderedDict
, as suggested from this SO post:
>>> import numpy as np >>> import pandas as pd >>> from collections import OrderedDict >>> >>> a = np.array( [ 1, 2, 3 ] ) >>> b = np.array( [ 4, 5, 6 ] ) >>> c = np.array( [ 7, 8, 9 ] ) >>> >>> pd.DataFrame( OrderedDict( { 'a': pd.Series(a), 'b': pd.Series(b), 'c': pd.Series(c) } ) ) a c b 0 1 7 4 1 2 8 5 2 3 9 6 >>> pd.DataFrame( OrderedDict( (('a', pd.Series(a)), ('b', pd.Series(b)), ('c', pd.Series(c))) ) ) a b c 0 1 4 7 1 2 5 8 2 3 6 9
Use the columns
keyword when creating the DataFrame
:
pd.DataFrame({'foo': foo, 'bar': bar}, columns=['foo', 'bar'])
Also, note that you don't need to create the Series.
To preserve column order pass in your numpy arrays as a list of tuples to DataFrame.from_items
:
>>> df = pd.DataFrame.from_items([('foo', foo), ('bar', bar)])
foo bar
0 1 4
1 2 5
2 3 6
Update
From pandas 0.23 from_items
is deprecated and will be removed. So pass the numpy
arrays using from_dict
. To use from_dict
you need to pass the items as a dictionary:
>>> from collections import OrderedDict as OrderedDict
>>> df = pd.DataFrame.from_dict(OrderedDict(zip(['foo', 'bar'], [foo, bar])))
From python 3.7 you can depend on insertion order being preserved (see https://mail.python.org/pipermail/python-dev/2017-December/151283.html) so:
>>> df = pd.DataFrame.from_dict(dict(zip(['foo', 'bar'], [foo, bar])))
or simply:
>>> df = pd.DataFrame(dict(zip(['foo', 'bar'], [foo, bar])))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With