I am working with a Dataframe that has five sheets and I want to use four of them. So I can load it in:
df = pd.read_excel('***.xls', sheet_name=['a', 'b', 'c', 'd'])
But now I would like to add a column that says what sheet each row was in, and I am not sure how to do this. I tried something like this
for name, frame in df.items():
frame['Sheet'] = name
df = df.append(frame, ignore_index=True)
but I was getting the following error:
AttributeError: 'collections.OrderedDict' object has no attribute 'append'
Any help would be greatly appreciated. Thank you in advance!
Let's say this is what my data looks like after I concat the sheets:
df = pd.concat(pd.read_excel(***.xls, sheet_name=['a', 'b', 'c', 'd'],
header=1), ignore_index=True, sort=False)
Concat data
My goal is to add a column that says what sheet each row was from, like so...
Concat data with sheet name row
Hopefully that helps you understand what I am trying to go for.
(Edit) I would also like to know how to do this if I wanted to use all the sheets in a dataframe, but didn't want to list the individual names of each sheet. Thanks!
How to Add a Column to a Pandas DataFrame You can use the assign () function to add a new column to the end of a pandas DataFrame: df = df.assign(col_name= [value1, value2, value3,...]) And you can use the insert () function to add a new column to a specific location in a pandas DataFrame:
Append method is used to rows of other dataframe to existing dataframe. Using dataframe.append () method in Python we can append the rows of other dataframe to an exisitng one. If there is any extra column then new column is created with that name.
Rows represents the records/ tuples and columns refers to the attributes. We can create the DataFrame by using pandas.DataFrame () method. We can also create a DataFrame using dictionary by skipping columns and indices.
For any dataframe , say df , you can add/modify column names by passing the column names in a list to the df.columns method: For example, if you want the column names to be 'A', 'B', 'C', 'D'],use this. df.columns = ['A', 'B', 'C', 'D’]
IIUC, try DataFrame.assign
in a list comprehension
:
sheets = ['a', 'b', 'c', 'd']
df = pd.concat([pd.read_excel('***.xls', sheet_name=s)
.assign(sheet_name=s) for s in sheets])
If you want to use all sheets and assign a column of sheetname, you could do:
workbook = pd.ExcelFile('***.xls')
sheets = workbook.sheet_names
df = pd.concat([pd.read_excel(workbook, sheet_name=s)
.assign(sheet_name=s) for s in sheets])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With