Python Pandas User Warning: Sorting because non-concatenation axis is not aligned

Tags:

2 Answers

tl;dr:

concat and append currently sort the non-concatenation index (e.g. columns if you're adding rows) if the columns don't match. In pandas 0.23 this started generating a warning; pass the parameter sort=True to silence it. In the future the default will change to not sort, so it's best to specify either sort=True or False now, or better yet ensure that your non-concatenation indices match.

The warning is new in pandas 0.23.0:

In a future version of pandas pandas.concat() and DataFrame.append() will no longer sort the non-concatenation axis when it is not already aligned. The current behavior is the same as the previous (sorting), but now a warning is issued when sort is not specified and the non-concatenation axis is not aligned, link.

More information from linked very old github issue, comment by smcinerney :

When concat'ing DataFrames, the column names get alphanumerically sorted if there are any differences between them. If they're identical across DataFrames, they don't get sorted.

This sort is undocumented and unwanted. Certainly the default behavior should be no-sort.

After some time the parameter sort was implemented in pandas.concat and DataFrame.append:

sort : boolean, default None

Sort non-concatenation axis if it is not already aligned when join is 'outer'. The current default of sorting is deprecated and will change to not-sorting in a future version of pandas.

Explicitly pass sort=True to silence the warning and sort. Explicitly pass sort=False to silence the warning and not sort.

This has no effect when join='inner', which already preserves the order of the non-concatenation axis.

So if both DataFrames have the same columns in the same order, there is no warning and no sorting:

df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=['a', 'b']) df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=['a', 'b'])  print (pd.concat([df1, df2]))    a  b 0  1  0 1  2  8 0  4  7 1  5  3  df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=['b', 'a']) df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=['b', 'a'])  print (pd.concat([df1, df2]))    b  a 0  0  1 1  8  2 0  7  4 1  3  5

But if the DataFrames have different columns, or the same columns in a different order, pandas returns a warning if no parameter sort is explicitly set (sort=None is the default value):

df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8]}, columns=['b', 'a']) df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3]}, columns=['a', 'b'])  print (pd.concat([df1, df2]))

FutureWarning: Sorting because non-concatenation axis is not aligned.

   a  b 0  1  0 1  2  8 0  4  7 1  5  3  print (pd.concat([df1, df2], sort=True))    a  b 0  1  0 1  2  8 0  4  7 1  5  3  print (pd.concat([df1, df2], sort=False))    b  a 0  0  1 1  8  2 0  7  4 1  3  5

If the DataFrames have different columns, but the first columns are aligned - they will be correctly assigned to each other (columns a and b from df1 with a and b from df2 in the example below) because they exist in both. For other columns that exist in one but not both DataFrames, missing values are created.

Lastly, if you pass sort=True, columns are sorted alphanumerically. If sort=False and the second DafaFrame has columns that are not in the first, they are appended to the end with no sorting:

df1 = pd.DataFrame({"a": [1, 2], "b": [0, 8], 'e':[5, 0]},                      columns=['b', 'a','e']) df2 = pd.DataFrame({"a": [4, 5], "b": [7, 3], 'c':[2, 8], 'd':[7, 0]},                      columns=['c','b','a','d'])  print (pd.concat([df1, df2]))

FutureWarning: Sorting because non-concatenation axis is not aligned.

   a  b    c    d    e 0  1  0  NaN  NaN  5.0 1  2  8  NaN  NaN  0.0 0  4  7  2.0  7.0  NaN 1  5  3  8.0  0.0  NaN  print (pd.concat([df1, df2], sort=True))    a  b    c    d    e 0  1  0  NaN  NaN  5.0 1  2  8  NaN  NaN  0.0 0  4  7  2.0  7.0  NaN 1  5  3  8.0  0.0  NaN  print (pd.concat([df1, df2], sort=False))     b  a    e    c    d 0  0  1  5.0  NaN  NaN 1  8  2  0.0  NaN  NaN 0  7  4  NaN  2.0  7.0 1  3  5  NaN  8.0  0.0

In your code:

placement_by_video_summary = placement_by_video_summary.drop(placement_by_video_summary_new.index)                                                        .append(placement_by_video_summary_new, sort=True)                                                        .sort_index()

answered Sep 19 '22 16:09

jezrael

jezrael's answer is good, but did not answer a question I had: Will getting the "sort" flag wrong mess up my data in any way? The answer is apparently "no", you are fine either way.

from pandas import DataFrame, concat  a = DataFrame([{'a':1,      'c':2,'d':3      }]) b = DataFrame([{'a':4,'b':5,      'd':6,'e':7}])  >>> concat([a,b],sort=False)    a    c  d    b    e 0  1  2.0  3  NaN  NaN 0  4  NaN  6  5.0  7.0  >>> concat([a,b],sort=True)    a    b    c  d    e 0  1  NaN  2.0  3  NaN 0  4  5.0  NaN  6  7.0

answered Sep 22 '22 16:09

RLC

Related questions
                            
                                Combine two pandas Data Frames (join on a common column)
                            
                                Django Setup Default Logging
                            
                                Convert Python dictionary to JSON array
                            
                                python: Appending a dictionary to a list - I see a pointer like behavior
                            
                                secret key not set in flask session, using the Flask-Session extension
                            
                                Pandas: rolling mean by time interval
                            
                                how to convert a string date into datetime format in python? [duplicate]
                            
                                Jupyter notebook not trusted
                            
                                How should I declare default values for instance variables in Python?
                            
                                How to read file with space separated values in pandas
                            
                                Quantile-Quantile Plot using SciPy
                            
                                Functional pipes in python like %>% from R's magrittr
                            
                                Why can a Python dict have multiple keys with the same hash?
                            
                                No plot window in matplotlib
                            
                                Strengths of Shell Scripting compared to Python [closed]
                            
                                How accurate is python's time.sleep()?
                            
                                How do I write a function that returns another function?
                            
                                Adding padding to a tkinter widget only on one side
                            
                                How do I search for an available Python package using pip?
                            
                                How to iterate through a list of dictionaries in Jinja template?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas User Warning: Sorting because non-concatenation axis is not aligned

Tags:

python

pandas

dharmendra mishra

People also ask

2 Answers

jezrael

RLC

Recent Activity

Donate For Us