Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas align() function : illustrative example

Tags:

python

pandas

I came across this line of code

app_train_poly, app_test_poly = app_train_poly.align(app_test_poly, join = 'inner', axis = 1) 

here app_train_poly and app_test_poly are the pandas dataframe.

I know that with align() you are able to perform some sort of combining of the two dataframes but I am not able to visualize how does it actually work.

I searched the documentation but could not find any illustrative example.

like image 802
ashunigion Avatar asked Aug 02 '18 03:08

ashunigion


People also ask

What does pandas align do?

Pandas Align basically helps to align the two dataframes have the same row and/or column configuration and as per their documentation it Align two objects on their axes with the specified join method for each axis Index.

How do I left Align in pandas?

In order to align columns to left in pandas dataframe, we use the dataframe. style. set_properties() function.

How do you align columns in Python?

Use the just() Function to Print With Column Alignment in Python. In Python, we have different methods for string alignment. We can align the strings using the ljust() , rjust , and center() functions. Using these, we can align the rows appropriately based on our requirements.


1 Answers

You are on the right track, except that DataFrame.align doesn't combine two dataframes, rather it aligns them so that the two dataframes have the same row and/or column configuration. Let's try an example:

Initialising two dataframes with some descriptive column names and toy data:

df1 = pd.DataFrame([[1,2,3,4], [6,7,8,9]], columns=['D', 'B', 'E', 'A'], index=[1,2]) df2 = pd.DataFrame([[10,20,30,40], [60,70,80,90], [600,700,800,900]], columns=['A', 'B', 'C', 'D'], index=[2,3,4]) 

Now, let's view these data frames by themselves:

print(df1) 
   D  B  E  A 1  1  2  3  4 2  6  7  8  9 
print(df2) 
     A    B    C    D 2   10   20   30   40 3   60   70   80   90 4  600  700  800  900 

Let's align these two dataframes, aligning by columns (axis=1), and performing an outer join on column labels (join='outer'):

a1, a2 = df1.align(df2, join='outer', axis=1) print(a1) print(a2) 
   A  B   C  D  E 1  4  2 NaN  1  3 2  9  7 NaN  6  8      A    B    C    D   E 2   10   20   30   40 NaN 3   60   70   80   90 NaN 4  600  700  800  900 NaN 

A few things to notice here:

  • The columns in df1 have been rearranged so they align with the columns in df2.
  • There is a column labelled 'C' that has been added to df1, and a column labelled 'E' that has been added to df2. These columns have been filled with NaN. This is because we performed an outer join on the column labels.
  • None of the values inside the DataFrames have been altered.
  • Note that the row labels are not aligned; df2 has rows 3 and 4, whereas df1 does not. This is because we requested alignment on columns (axis=1).

What happens if we align on both rows and columns, but change the join parameter to 'right'?

a1, a2 = df1.align(df2, join='right', axis=None) print(a1) print(a2) 
     A    B   C    D 2  9.0  7.0 NaN  6.0 3  NaN  NaN NaN  NaN 4  NaN  NaN NaN  NaN      A    B    C    D 2   10   20   30   40 3   60   70   80   90 4  600  700  800  900 

Note that:

  • Only the columns and rows that are found in the "right" dataframe (df2) are retained. Column 'E' is no longer present. This is because we made a right join on both the column and row labels.
  • Rows with labels 3 and 4 have been added to df1, filled with Nan. This is because we requested alignment on both rows and columns (axis=None).
  • Row labels are now aligned as well as column labels.
  • Again, note that none of the actual values within the dataframes have been altered.

Finally, let's have a look at the code in the question, with join='inner' and axis=1:

a1, a2 = df1.align(df2, join='inner', axis=1) print(a1) print(a2) 
   D  B  A 1  1  2  4 2  6  7  9      D    B    A 2   40   20   10 3   90   70   60 4  900  700  600 
  • Only column labels are aligned (axis=1).
  • Only column labels that are present in both df1 and df2 are retained (join='inner').

In summary, use DataFrame.align() when you want to make sure the arrangement of rows and/or columns is the same between two dataframes, without altering any of the data contained within the two dataframes.

like image 84
Andrew Guy Avatar answered Oct 08 '22 11:10

Andrew Guy