I want to add a new column, Category, in each of my 8 similar dataframes.
The values in this column are the same, they are also the df name, like df1_p8
in this example.
I have used:
In: df61_p8.insert(3,"Category","df61_p8", True)
# or simply, df61_p8['Category']='df61_p8'
Out:
code violation_description Category
89491 9-1-503 Defective or obstructed duct system one- building df61_p8
102045 9-1-503 Defective or obstructed duct system one- building df61_p8
103369 9-1-503 Defective or obstructed duct system one- building df61_p8
130440 9-1-502 Failure to maintain at least one (1) elevator df61_p8
132446 9-1-503 Defective or obstructed duct system one- building df61_p8
Ultimately, I want to append/concat these 8 dataframes into one dataframe.
I wonder if there is more efficient way to do it, rather than using .insert one by one on each dataframe. Something like loops or lambdas.. As a beginner, I am not sure how to apply them in my case? thank you.
append_alldfs = []
x=[df61_p1,df61_p2,df61_p3,df61_p4,df61_p5,df61_p6,df61_p7,df61_p8]
lambdafunc = lambda x: x.insert(3,"Category","x",True)
You can use pd.concat
with keys
parameter then reset_index
:
pd.concat([df0,df1,df2,df3], keys=['df0', 'df1', 'df2', 'df3']).reset_index(level=0)
MCVE:
df0 = pd.DataFrame(np.ones((3,3)), columns=[*'ABC'])
df1 = pd.DataFrame(np.zeros((3,3)), columns=[*'ABC'])
df2 = pd.DataFrame(np.zeros((3,3))+3, columns=[*'ABC'])
df3 = pd.DataFrame(np.zeros((3,3))+4, columns=[*'ABC'])
df_out = pd.concat([df0,df1,df2,df3], keys=['df0', 'df1', 'df2', 'df3']).reset_index(level=0)
df_out
Output:
level_0 A B C
0 df0 1.0 1.0 1.0
1 df0 1.0 1.0 1.0
2 df0 1.0 1.0 1.0
0 df1 0.0 0.0 0.0
1 df1 0.0 0.0 0.0
2 df1 0.0 0.0 0.0
0 df2 3.0 3.0 3.0
1 df2 3.0 3.0 3.0
2 df2 3.0 3.0 3.0
0 df3 4.0 4.0 4.0
1 df3 4.0 4.0 4.0
2 df3 4.0 4.0 4.0
Keep it simple and explicit.
for col_val, df in [
('df61_p1', df61_p1),
('df61_p2', df61_p2),
('df61_p3', df61_p3),
('df61_p4', df61_p4),
('df61_p5', df61_p5),
('df61_p6', df61_p6),
('df61_p7', df61_p7),
('df61_p8', df61_p8),
]:
df['Category'] = col_val
While there are certainly more 'meta-programming-ey' ways of accomplishing the same task, these are usually quite convoluted and more complicated to understand and refactor.
Given the structure of this code, however, I imagine that there are ways you could get rid of this problem before you even get to this point.
For example, at what point did those dataframes get split up? Perhaps by never using separate DataFrames in the first place [keep the original dataframe together/concat at beginning] (and using apply, groupby, pivot and melt operations as needed), you can avoid this problem altogether.
def add_column(df, col_name, col_value):
return df.insert(loc=-1, column=col_name, value=col_value, allow_duplicates = False)
df_list = [........]
col_name = ...
col_value = .... # copy column (Category) values
res = map(lambda df: add_column(df, col_name, col_value), df_list)
list(res)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With