How to read merged Excel cells with NaN into Pandas DataFrame

Tags:

I would like to read an Excel sheet into Pandas DataFrame. However, there are merged Excel cells as well as Null rows (full/partial NaN filled), as shown below. To clarify, John H. has made an order to purchase all the albums from "The Bodyguard" to "Red Pill Blues".

Excel sheet capture

When I read this Excel sheet into a Pandas DataFrame, the Excel data does not get transferred correctly. Pandas considers a merged cell as one cell. The DataFrame looks like the following: (Note: Values in () are the desired values that I would like to have there)

Dataframe capture

Please note that the last row does not contain merged cells; it only carries a value for Artist column.

EDIT: I did try the following to forward-fill in the NaN values:(Pandas: Reading Excel with merged cells)

Click to copy

df.index = pd.Series(df.index).fillna(method='ffill')

However, the NaN values remain. What strategy or method could I use to populate the DataFrame correctly? Is there a Pandas method of unmerging the cells and duplicating the corresponding contents?

712

asked Dec 15 '17 14:12

CPU

1 Answers

The referenced link you attempted needed to forward fill only the index column. For your use case, you need to fillna for all dataframe columns. So, simply forward fill entire dataframe:

Click to copy

df = pd.read_excel("Input.xlsx")
print(df)

#    Order_ID Customer_name            Album_Name           Artist  Quantity
# 0       NaN           NaN            RadioShake              NaN       NaN
# 1       1.0       John H.         The Bodyguard  Whitney Houston       2.0
# 2       NaN           NaN              Lemonade          Beyonce       1.0
# 3       NaN           NaN  The Thrill Of It All        Sam Smith       2.0
# 4       NaN           NaN              Thriller  Michael Jackson      11.0
# 5       NaN           NaN                Divide       Ed Sheeran       4.0
# 6       NaN           NaN            Reputation     Taylor Swift       3.0
# 7       NaN           NaN        Red Pill Blues         Maroon 5       5.0

df = df.fillna(method='ffill')
print(df)

#    Order_ID Customer_name            Album_Name           Artist  Quantity
# 0       NaN           NaN            RadioShake              NaN       NaN
# 1       1.0       John H.         The Bodyguard  Whitney Houston       2.0
# 2       1.0       John H.              Lemonade          Beyonce       1.0
# 3       1.0       John H.  The Thrill Of It All        Sam Smith       2.0
# 4       1.0       John H.              Thriller  Michael Jackson      11.0
# 5       1.0       John H.                Divide       Ed Sheeran       4.0
# 6       1.0       John H.            Reputation     Taylor Swift       3.0
# 7       1.0       John H.        Red Pill Blues         Maroon 5       5.0

195

answered Oct 16 '22 12:10

Parfait

Related questions
                            
                                Flask-SQLAlchemy check if table exists in database
                            
                                Date axis in heatmap seaborn
                            
                                In Python Docstrings, What Does `:obj:` do?
                            
                                Drawing fewer plots than specified in matplotlib subplots
                            
                                Python Pandas Group By Error 'Index' object has no attribute 'labels'
                            
                                Using python's networkX to compute personalized page rank
                            
                                why plt.show() shows one extra blank figure
                            
                                PCA memory error in Sklearn: Alternative Dim Reduction?
                            
                                None dimension raise ValueError in batch_norm with Tensorflow
                            
                                Why does argparse include default value for optional argument even when argument is specified?
                            
                                AttributeError: module 'pandas' has no attribute 'read_csv' Python3.5
                            
                                Is there any python "utf-8" string constant?
                            
                                pyparsing nestedExpr and nested parentheses
                            
                                How to access serializer.data on ListSerializer parent class in DRF?
                            
                                how to control output from fbprophet?
                            
                                Python 3 UnicodeDecodeError - How do I debug UnicodeDecodeError?
                            
                                Stream files to kafka using airflow
                            
                                How to check the channel order of an image?
                            
                                elasticsearch-dsl aggregations returns only 10 results. How to change this
                            
                                Django test runner failing with "relation does not exist" error

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to read merged Excel cells with NaN into Pandas DataFrame

Tags:

python

python-3.x

pandas

excel

CPU

People also ask

1 Answers

Parfait

Recent Activity

Donate For Us