Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I add a column to my dataframe that says what sheet name each row is from? Python

I am working with a Dataframe that has five sheets and I want to use four of them. So I can load it in:

df = pd.read_excel('***.xls', sheet_name=['a', 'b', 'c', 'd'])

But now I would like to add a column that says what sheet each row was in, and I am not sure how to do this. I tried something like this

for name, frame in df.items():
        frame['Sheet'] = name
        df = df.append(frame, ignore_index=True)

but I was getting the following error:

AttributeError: 'collections.OrderedDict' object has no attribute 'append'

Any help would be greatly appreciated. Thank you in advance!

Let's say this is what my data looks like after I concat the sheets:

df = pd.concat(pd.read_excel(***.xls, sheet_name=['a', 'b', 'c', 'd'],
                          header=1), ignore_index=True, sort=False)

Concat data

My goal is to add a column that says what sheet each row was from, like so...

Concat data with sheet name row

Hopefully that helps you understand what I am trying to go for.

(Edit) I would also like to know how to do this if I wanted to use all the sheets in a dataframe, but didn't want to list the individual names of each sheet. Thanks!

like image 236
jpk Avatar asked Dec 13 '19 19:12

jpk


People also ask

How do I add a column to a Dataframe in Python?

How to Add a Column to a Pandas DataFrame You can use the assign () function to add a new column to the end of a pandas DataFrame: df = df.assign(col_name= [value1, value2, value3,...]) And you can use the insert () function to add a new column to a specific location in a pandas DataFrame:

How to append rows of other Dataframe to existing Dataframe?

Append method is used to rows of other dataframe to existing dataframe. Using dataframe.append () method in Python we can append the rows of other dataframe to an exisitng one. If there is any extra column then new column is created with that name.

How to create a Dataframe from a row in pandas?

Rows represents the records/ tuples and columns refers to the attributes. We can create the DataFrame by using pandas.DataFrame () method. We can also create a DataFrame using dictionary by skipping columns and indices.

How do I change the name of a column in a Dataframe?

For any dataframe , say df , you can add/modify column names by passing the column names in a list to the df.columns method: For example, if you want the column names to be 'A', 'B', 'C', 'D'],use this. df.columns = ['A', 'B', 'C', 'D’]


1 Answers

IIUC, try DataFrame.assign in a list comprehension:

sheets = ['a', 'b', 'c', 'd']

df = pd.concat([pd.read_excel('***.xls', sheet_name=s)
                .assign(sheet_name=s) for s in sheets])

Update

If you want to use all sheets and assign a column of sheetname, you could do:

workbook = pd.ExcelFile('***.xls')
sheets = workbook.sheet_names

df = pd.concat([pd.read_excel(workbook, sheet_name=s)
                .assign(sheet_name=s) for s in sheets])
like image 94
Chris Adams Avatar answered Sep 27 '22 22:09

Chris Adams