Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to assign columns while ignoring index alignment

Tags:

python

pandas

Say I have two dataframes x and y in Pandas, I would like to fill in a column in x with the result of sorting a column in y. I tried this:

x['foo']  = y['bar'].order(ascending=False)

but it didn't work, I suspect because Pandas aligns indices between x and y (which have the same set of indices) during the assignment

How can I have Pandas fill in the x['foo'] with another column from another dataframe ignoring the alignment of indices?

like image 499
Amelio Vazquez-Reina Avatar asked Apr 12 '13 19:04

Amelio Vazquez-Reina


2 Answers

The simplest way I can think of to get pandas to ignore the indices is to give it something without indices to ignore. Starting from

>>> x = pd.DataFrame({"foo": [10,20,30]},index=[1,2,0])
>>> y = pd.DataFrame({"bar": [33,11,22]},index=[0,1,2])
>>> x
   foo
1   10
2   20
0   30
>>> y
   bar
0   33
1   11
2   22

We have the usual aligned approach:

>>> x["foo"] = y["bar"].order(ascending=False)
>>> x
   foo
1   11
2   22
0   33

Or an unaligned one, by setting x["foo"] to a list:

>>> x["foo"] = y["bar"].order(ascending=False).tolist()
>>> x
   foo
1   33
2   22
0   11
like image 103
DSM Avatar answered Sep 26 '22 15:09

DSM


I tried the code but it seems that the order() method has been deprecated, which is no surprise since the initial question is quite old. So now we are left with sort_values() to achieve the same result. On top of that there's a refinement which consist in using to_numpy() since it is slightly faster and could be useful in case of big DataFrames (.values is even faster but it is recommended to use to_numpy() whenever working for production as explained here: https://stackoverflow.com/a/54324513/4909087)

>>> x["foo"] = y["bar"].sort_values(ascending=False)
>>> x
   foo
1   33
2   22
0   11

>>> %timeit x["foo"] = y["bar"].sort_values(ascending=False).to_list()
165 µs ± 965 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit x["foo"] = y["bar"].sort_values(ascending=False).to_numpy()
136 µs ± 421 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit x["foo"] = y["bar"].sort_values(ascending=False).values
129 µs ± 826 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
like image 33
Iqigai Avatar answered Sep 25 '22 15:09

Iqigai