Python Pandas read_excel dtype str replace nan by blank ('') when reading or when writing via to_csv

Tags:

Python version: Python 2.7.13 :: Anaconda custom (64-bit) Pandas version: pandas 0.20.2

Hello,

I have a quite simple requirement. I would like to read an excel file and write a specific sheet to a csv file. Blank values in the source Excel file should be treated / written as blank when writing the csv file. However, my blank records are always written as 'nan' to the output file. (without the quotes)

I read the Excel file via method

read_excel(xlsx, sheetname='sheet1', dtype = str)

I am specifying dtype because I have some columns that are numbers but should be treated as string. (Otherwise they might lose leading 0s etc) i.e. I would like to read the exact value from every cell.

Now I write the output .csv file via to_csv(output_file,index=False,mode='wb',sep=',',encoding='utf-8')

However, my result csv file contains nan for all blank cells from the excel file.

What am I missing? I already tried .fillna('', inplace=True) function but it seems to be doing nothing to my data. I also tried to add parameter na_rep ='' to the to_csv method but without success.

Thanks for any help!

Addendum: Please find hereafter a reproducible example.

Please find hereafter a reproducible example code. Please first create a new Excel file with 2 columns with the following content: COLUMNA COLUMNB COLUMNC 01 test 02 test
03 test

(I saved this Excel file to c:\test.xls Please note that 1st and 3rd row for column B as well as the 2nd row for Column C is blank/empty)

Now here is my code:

import pandas as pd
xlsx = pd.ExcelFile('c:\\test.xlsx')
df = pd.read_excel(xlsx, sheetname='Sheet1', dtype = str)
df.fillna('', inplace=True)
df.to_csv('c:\\test.csv', index=False,mode='wb',sep=',',encoding='utf-8', na_rep ='')

My result is:
COLUMNA,COLUMNB,COLUMNC
01,nan,test
02,test,nan
03,nan,test

My desired result would be:
COLUMNA,COLUMNB,COLUMNC
01,,test
02,test,
03,,test

292

asked Jul 17 '17 15:07

panda

1 Answers

Since you are dealing with nan strings, you can use the replace function:

df = pd.DataFrame({'Col1' : ['nan', 'foo', 'bar', 'baz', 'nan', 'test']})
df.replace('nan', '')

   Col1
0      
1   foo
2   bar
3   baz
4      
5  test

All 'nan' string values will be replaced by the empty string ''. replace is not in-place, so make sure you assign it back:

df = df.replace('nan', '')

You can then write it to your file using to_csv.

If you are actually looking to fill NaN values with blank, use fillna:

df = df.fillna('')

152

answered Oct 22 '22 18:10

cs95

Related questions
                            
                                Simulating linux terminal in browser
                            
                                How to change font and size of buttons and frame in tkinter using python?
                            
                                fitting multivariate curve_fit in python
                            
                                PyCharm "Run configuration" asking for "script parameters"
                            
                                How to specify long url patterns using Regex so that they follow PEP8 guidelines
                            
                                Install pyyaml using pip/Add PyYaml as pip dependency
                            
                                ctx parameter in multiprocessing.Queue
                            
                                Print all POST request parameters without knowing their names
                            
                                pytest capsys: checking output AND getting it reported?
                            
                                Problems in implementing Horner's method in Python
                            
                                Splines with Python (using control knots and endpoints)
                            
                                Python: TypeError: Pickling an AuthenticationString object is disallowed for security reasons
                            
                                An array field in scrapy.Item
                            
                                how can I flatten an 2d numpy array, which has different length in the second axis?
                            
                                how to deactivate virtualenv from a bash script
                            
                                Overriding dict.update() method in subclass to prevent overwriting dict keys
                            
                                Asyncio + aiohttp - redis Pub/Sub and websocket read/write in single handler
                            
                                Asyncio RuntimeError: Event Loop is Closed
                            
                                LINK : fatal error LNK1104: cannot open file 'python27.lib'
                            
                                Fatal error C1083: Cannot open include file: 'openssl/opensslv.h'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python Pandas read_excel dtype str replace nan by blank ('') when reading or when writing via to_csv

Tags:

python

pandas

csv

excel

nan

panda

People also ask

1 Answers

cs95

Recent Activity

Donate For Us