Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get pandas.read_csv to read empty values as empty string instead of nan

Tags:

python

pandas

csv

People also ask

How do you change NaN to blank in Pandas?

Convert Nan to Empty String in PandasUse df. replace(np. nan,'',regex=True) method to replace all NaN values to an empty string in the Pandas DataFrame column.

How do I read null values in Pandas?

In order to check null values in Pandas Dataframe, we use notnull() function this function return dataframe of Boolean values which are False for NaN values.

Does Panda read NaN na?

This is what Pandas documentation gives: na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.

Does read_csv read blank lines?

read_csv disregard any empty line and taking the first non-empty line as the header.


I was still confused after reading the other answers and comments. But the answer now seems simpler, so here you go.

Since Pandas version 0.9 (from 2012), you can read your csv with empty cells interpreted as empty strings by simply setting keep_default_na=False:

pd.read_csv('test.csv', keep_default_na=False)

This issue is more clearly explained in

  • More consistent na_values handling in read_csv · Issue #1657 · pandas-dev/pandas

That was fixed on on Aug 19, 2012 for Pandas version 0.9 in

  • BUG: more consistent na_values #1657 · pandas-dev/pandas@d9abf68

I added a ticket to add an option of some sort here:

https://github.com/pydata/pandas/issues/1450

In the meantime, result.fillna('') should do what you want

EDIT: in the development version (to be 0.8.0 final) if you specify an empty list of na_values, empty strings will stay empty strings in the result


We have a simple argument in Pandas read_csv() for this:

Use:

df = pd.read_csv('test.csv', na_filter= False)

What pandas defines by default as missing value while read_csv() can be found here.

import pandas
default_missing = pandas._libs.parsers.STR_NA_VALUES
print(default_missing)

The output

{'', '<NA>', 'nan', '1.#QNAN', 'NA', 'null', 'n/a', '-nan', '1.#IND', '#N/A N/A', 'N/A', 'NULL', 'NaN', '-1.#IND', '-1.#QNAN', '#NA', '#N/A', '-NaN'}

With that you can do an opt-out.

import pandas
default_missing = pandas._libs.parsers.STR_NA_VALUES
default_missing = default_missing.remove('')
default_missing = default_missing.remove('na')

with open('test.csv', 'r') as csv_file:
    pandas.read_csv(csv_file, na_values=default_missing)