I came across this line of code
app_train_poly, app_test_poly = app_train_poly.align(app_test_poly, join = 'inner', axis = 1)
here app_train_poly and app_test_poly are the pandas dataframe.
I know that with align() you are able to perform some sort of combining of the two dataframes but I am not able to visualize how does it actually work.
I searched the documentation but could not find any illustrative example.
Pandas Align basically helps to align the two dataframes have the same row and/or column configuration and as per their documentation it Align two objects on their axes with the specified join method for each axis Index.
In order to align columns to left in pandas dataframe, we use the dataframe. style. set_properties() function.
Use the just() Function to Print With Column Alignment in Python. In Python, we have different methods for string alignment. We can align the strings using the ljust() , rjust , and center() functions. Using these, we can align the rows appropriately based on our requirements.
You are on the right track, except that DataFrame.align
doesn't combine two dataframes, rather it aligns them so that the two dataframes have the same row and/or column configuration. Let's try an example:
Initialising two dataframes with some descriptive column names and toy data:
df1 = pd.DataFrame([[1,2,3,4], [6,7,8,9]], columns=['D', 'B', 'E', 'A'], index=[1,2]) df2 = pd.DataFrame([[10,20,30,40], [60,70,80,90], [600,700,800,900]], columns=['A', 'B', 'C', 'D'], index=[2,3,4])
Now, let's view these data frames by themselves:
print(df1)
D B E A 1 1 2 3 4 2 6 7 8 9
print(df2)
A B C D 2 10 20 30 40 3 60 70 80 90 4 600 700 800 900
Let's align these two dataframes, aligning by columns (axis=1
), and performing an outer join on column labels (join='outer'
):
a1, a2 = df1.align(df2, join='outer', axis=1) print(a1) print(a2)
A B C D E 1 4 2 NaN 1 3 2 9 7 NaN 6 8 A B C D E 2 10 20 30 40 NaN 3 60 70 80 90 NaN 4 600 700 800 900 NaN
A few things to notice here:
df1
have been rearranged so they align with the columns in df2
.'C'
that has been added to df1
, and a column labelled 'E'
that has been added to df2
. These columns have been filled with NaN
. This is because we performed an outer join on the column labels.df2
has rows 3
and 4
, whereas df1
does not. This is because we requested alignment on columns (axis=1
).What happens if we align on both rows and columns, but change the join
parameter to 'right'
?
a1, a2 = df1.align(df2, join='right', axis=None) print(a1) print(a2)
A B C D 2 9.0 7.0 NaN 6.0 3 NaN NaN NaN NaN 4 NaN NaN NaN NaN A B C D 2 10 20 30 40 3 60 70 80 90 4 600 700 800 900
Note that:
df2
) are retained. Column 'E'
is no longer present. This is because we made a right join on both the column and row labels.3
and 4
have been added to df1
, filled with Nan
. This is because we requested alignment on both rows and columns (axis=None
).Finally, let's have a look at the code in the question, with join='inner'
and axis=1
:
a1, a2 = df1.align(df2, join='inner', axis=1) print(a1) print(a2)
D B A 1 1 2 4 2 6 7 9 D B A 2 40 20 10 3 90 70 60 4 900 700 600
axis=1
).df1
and df2
are retained (join='inner'
).In summary, use DataFrame.align()
when you want to make sure the arrangement of rows and/or columns is the same between two dataframes, without altering any of the data contained within the two dataframes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With