I would like to read an Excel sheet into Pandas DataFrame. However, there are merged Excel cells as well as Null rows (full/partial NaN
filled), as shown below. To clarify, John H. has made an order to purchase all the albums from "The Bodyguard" to "Red Pill Blues".
When I read this Excel sheet into a Pandas DataFrame, the Excel data does not get transferred correctly. Pandas considers a merged cell as one cell. The DataFrame looks like the following: (Note: Values in () are the desired values that I would like to have there)
Please note that the last row does not contain merged cells; it only carries a value for Artist
column.
df.index = pd.Series(df.index).fillna(method='ffill')
However, the NaN
values remain. What strategy or method could I use to populate the DataFrame correctly? Is there a Pandas method of unmerging the cells and duplicating the corresponding contents?
To read an excel file as a DataFrame, use the pandas read_excel() method. You can read the first sheet, specific sheets, multiple sheets or all sheets. Pandas converts this to the DataFrame structure, which is a tabular like structure.
We can use the pandas module read_excel() function to read the excel file data into a DataFrame object.
You can merge the columns using the pop() method. In this, you are popping the values of “age1” columns and filling it with the popped values of the other columns “revised_age“. You will get the output below.
The referenced link you attempted needed to forward fill only the index column. For your use case, you need to fillna
for all dataframe columns. So, simply forward fill entire dataframe:
df = pd.read_excel("Input.xlsx")
print(df)
# Order_ID Customer_name Album_Name Artist Quantity
# 0 NaN NaN RadioShake NaN NaN
# 1 1.0 John H. The Bodyguard Whitney Houston 2.0
# 2 NaN NaN Lemonade Beyonce 1.0
# 3 NaN NaN The Thrill Of It All Sam Smith 2.0
# 4 NaN NaN Thriller Michael Jackson 11.0
# 5 NaN NaN Divide Ed Sheeran 4.0
# 6 NaN NaN Reputation Taylor Swift 3.0
# 7 NaN NaN Red Pill Blues Maroon 5 5.0
df = df.fillna(method='ffill')
print(df)
# Order_ID Customer_name Album_Name Artist Quantity
# 0 NaN NaN RadioShake NaN NaN
# 1 1.0 John H. The Bodyguard Whitney Houston 2.0
# 2 1.0 John H. Lemonade Beyonce 1.0
# 3 1.0 John H. The Thrill Of It All Sam Smith 2.0
# 4 1.0 John H. Thriller Michael Jackson 11.0
# 5 1.0 John H. Divide Ed Sheeran 4.0
# 6 1.0 John H. Reputation Taylor Swift 3.0
# 7 1.0 John H. Red Pill Blues Maroon 5 5.0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With