More efficient way to add columns with same string values in multiple dataframes with loops or lambdas?

Question

I want to add a new column, Category, in each of my 8 similar dataframes. The values in this column are the same, they are also the df name, like df1_p8 in this example. I have used:

In:     df61_p8.insert(3,"Category","df61_p8", True)
# or simply, df61_p8['Category']='df61_p8'

Out:
        code    violation_description                                            Category
89491   9-1-503 Defective or obstructed duct system one- building                df61_p8
102045  9-1-503 Defective or obstructed duct system one- building                df61_p8
103369  9-1-503 Defective or obstructed duct system one- building                df61_p8
130440  9-1-502 Failure to maintain at least one (1) elevator                    df61_p8
132446  9-1-503 Defective or obstructed duct system one-  building               df61_p8

Ultimately, I want to append/concat these 8 dataframes into one dataframe.

I wonder if there is more efficient way to do it, rather than using .insert one by one on each dataframe. Something like loops or lambdas.. As a beginner, I am not sure how to apply them in my case? thank you.

append_alldfs = []
x=[df61_p1,df61_p2,df61_p3,df61_p4,df61_p5,df61_p6,df61_p7,df61_p8]
lambdafunc = lambda x: x.insert(3,"Category","x",True)

Scott Boston · Accepted Answer

You can use pd.concat with keys parameter then reset_index:

pd.concat([df0,df1,df2,df3], keys=['df0', 'df1', 'df2', 'df3']).reset_index(level=0)

MCVE:

df0  = pd.DataFrame(np.ones((3,3)), columns=[*'ABC'])
df1  = pd.DataFrame(np.zeros((3,3)), columns=[*'ABC'])
df2  = pd.DataFrame(np.zeros((3,3))+3, columns=[*'ABC'])
df3  = pd.DataFrame(np.zeros((3,3))+4, columns=[*'ABC'])

df_out = pd.concat([df0,df1,df2,df3], keys=['df0', 'df1', 'df2', 'df3']).reset_index(level=0)
df_out

Output:

  level_0    A    B    C
0     df0  1.0  1.0  1.0
1     df0  1.0  1.0  1.0
2     df0  1.0  1.0  1.0
0     df1  0.0  0.0  0.0
1     df1  0.0  0.0  0.0
2     df1  0.0  0.0  0.0
0     df2  3.0  3.0  3.0
1     df2  3.0  3.0  3.0
2     df2  3.0  3.0  3.0
0     df3  4.0  4.0  4.0
1     df3  4.0  4.0  4.0
2     df3  4.0  4.0  4.0

Mukul Ram · Answer

Keep it simple and explicit.

for col_val, df in [
   ('df61_p1', df61_p1),
   ('df61_p2', df61_p2),
   ('df61_p3', df61_p3),
   ('df61_p4', df61_p4),
   ('df61_p5', df61_p5),
   ('df61_p6', df61_p6),
   ('df61_p7', df61_p7),
   ('df61_p8', df61_p8),
]:
    df['Category'] = col_val

While there are certainly more 'meta-programming-ey' ways of accomplishing the same task, these are usually quite convoluted and more complicated to understand and refactor.

Given the structure of this code, however, I imagine that there are ways you could get rid of this problem before you even get to this point.

For example, at what point did those dataframes get split up? Perhaps by never using separate DataFrames in the first place [keep the original dataframe together/concat at beginning] (and using apply, groupby, pivot and melt operations as needed), you can avoid this problem altogether.

4.Pi.n · Answer

def add_column(df, col_name, col_value):
  return df.insert(loc=-1, column=col_name, value=col_value, allow_duplicates = False)

df_list = [........]
col_name = ... 
col_value = .... # copy column (Category) values

res = map(lambda df: add_column(df, col_name, col_value), df_list)
list(res)

More efficient way to add columns with same string values in multiple dataframes with loops or lambdas?

Tags:

python

loops

python-3.x

pandas

lambda

Bluetail

3 Answers

Scott Boston

Mukul Ram

4.Pi.n

Recent Activity

Donate For Us

More efficient way to add columns with same string values in multiple dataframes with loops or lambdas?

Tags:

python

loops

python-3.x

pandas

lambda

Bluetail

3 Answers

Scott Boston

Mukul Ram

4.Pi.n

Related questions

Recent Activity

Donate For Us