Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concatenate a list of pandas dataframes together

People also ask

How do I concatenate a list of DataFrames in pandas?

The simplest concatenation with concat() is by passing a list of DataFrames, for example [df1, df2] . And by default, it is concatenating vertically along the axis 0 and preserving all existing indices. If you want the concatenation to ignore existing indices, you can set the argument ignore_index=True .

How do I append a DataFrame list?

Using loc[] to Append The New List to a DataFrame. By using df. loc[index]=list you can append a list as a row to the DataFrame at a specified Index, In order to add at the end get the index of the last record using len(df) function.

How do I merge 6 pandas DataFrames?

Pandas merge() function is used to merge multiple Dataframes. We can use either pandas. merge() or DataFrame. merge() to merge multiple Dataframes.

How do I concatenate a series in pandas?

By using pandas. concat() you can combine pandas objects for example multiple series along a particular axis (column-wise or row-wise) to create a DataFrame. concat() method takes several params, for our scenario we use list that takes series to combine and axis=1 to specify merge series as columns instead of rows.


Given that all the dataframes have the same columns, you can simply concat them:

import pandas as pd
df = pd.concat(list_of_dataframes)

If the dataframes DO NOT all have the same columns try the following:

df = pd.DataFrame.from_dict(map(dict,df_list))

You also can do it with functional programming:

from functools import reduce
reduce(lambda df1, df2: df1.merge(df2, "outer"), mydfs)

Just to add few more details:

Example:

list1 = [df1, df2, df3]

import pandas as pd
  • Row-wise concatenation & ignoring indexes

    pd.concat(list1, axis=0, ignore_index=True)
    

    Note: If column names are not same then NaN would be inserted at different column values

  • Column-wise concatenation & want to keep column names

    pd.concat(list1, axis=1, ignore_index=False)
    

    If ignore_index=True, column names would be filled with numbers starting from 0 to (n-1), where n is the count of unique column names