Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read merged Excel cells with NaN into Pandas DataFrame

I would like to read an Excel sheet into Pandas DataFrame. However, there are merged Excel cells as well as Null rows (full/partial NaN filled), as shown below. To clarify, John H. has made an order to purchase all the albums from "The Bodyguard" to "Red Pill Blues".

Excel sheet capture

When I read this Excel sheet into a Pandas DataFrame, the Excel data does not get transferred correctly. Pandas considers a merged cell as one cell. The DataFrame looks like the following: (Note: Values in () are the desired values that I would like to have there)

Dataframe capture

Please note that the last row does not contain merged cells; it only carries a value for Artist column.


EDIT: I did try the following to forward-fill in the NaN values:(Pandas: Reading Excel with merged cells)
df.index = pd.Series(df.index).fillna(method='ffill')  

However, the NaN values remain. What strategy or method could I use to populate the DataFrame correctly? Is there a Pandas method of unmerging the cells and duplicating the corresponding contents?

like image 712
CPU Avatar asked Dec 15 '17 14:12

CPU


People also ask

How do I read an excel file to pandas DataFrame?

To read an excel file as a DataFrame, use the pandas read_excel() method. You can read the first sheet, specific sheets, multiple sheets or all sheets. Pandas converts this to the DataFrame structure, which is a tabular like structure.

Which method is used to read data from excel files in pandas?

We can use the pandas module read_excel() function to read the excel file data into a DataFrame object.

How do you merge excel columns in pandas?

You can merge the columns using the pop() method. In this, you are popping the values of “age1” columns and filling it with the popped values of the other columns “revised_age“. You will get the output below.


1 Answers

The referenced link you attempted needed to forward fill only the index column. For your use case, you need to fillna for all dataframe columns. So, simply forward fill entire dataframe:

df = pd.read_excel("Input.xlsx")
print(df)

#    Order_ID Customer_name            Album_Name           Artist  Quantity
# 0       NaN           NaN            RadioShake              NaN       NaN
# 1       1.0       John H.         The Bodyguard  Whitney Houston       2.0
# 2       NaN           NaN              Lemonade          Beyonce       1.0
# 3       NaN           NaN  The Thrill Of It All        Sam Smith       2.0
# 4       NaN           NaN              Thriller  Michael Jackson      11.0
# 5       NaN           NaN                Divide       Ed Sheeran       4.0
# 6       NaN           NaN            Reputation     Taylor Swift       3.0
# 7       NaN           NaN        Red Pill Blues         Maroon 5       5.0

df = df.fillna(method='ffill')
print(df)

#    Order_ID Customer_name            Album_Name           Artist  Quantity
# 0       NaN           NaN            RadioShake              NaN       NaN
# 1       1.0       John H.         The Bodyguard  Whitney Houston       2.0
# 2       1.0       John H.              Lemonade          Beyonce       1.0
# 3       1.0       John H.  The Thrill Of It All        Sam Smith       2.0
# 4       1.0       John H.              Thriller  Michael Jackson      11.0
# 5       1.0       John H.                Divide       Ed Sheeran       4.0
# 6       1.0       John H.            Reputation     Taylor Swift       3.0
# 7       1.0       John H.        Red Pill Blues         Maroon 5       5.0
like image 195
Parfait Avatar answered Oct 16 '22 12:10

Parfait