I have a DataFrame of the following form:
a b c
0 1 4 6
1 3 2 4
2 4 1 5
And I have a list of column names that I need to use to create a new DataFrame using the columns of the first DataFrame that correspond to each label. For example, if my list of columns is ['a', 'b', 'b', 'a', 'c'], the resulting DataFrame should be:
a b b a c
0 1 4 4 1 6
1 3 2 2 3 4
2 4 1 1 4 5
I've been trying to figure out a fast way of performing this operations because I'm dealing with extremly large DataFrames and I don't think looping is a reasonable option.
You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression. The blow example returns a Courses column where the Fee column value matches with 25000.
You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.
Different column names are specified for merges in Pandas using the “left_on” and “right_on” parameters, instead of using only the “on” parameter. Merging dataframes with different names for the joining variable is achieved using the left_on and right_on arguments to the pandas merge function.
You can just use the list to select them:
In [44]:
cols = ['a', 'b', 'b', 'a', 'c']
df[cols]
Out[44]:
a b b a c
0 1 4 4 1 6
1 3 2 2 3 4
2 4 1 1 4 5
[3 rows x 5 columns]
So no need for a loop, once you have created your dataframe df
then using a list of column names will just index them and create the df you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With