I get ValueError: cannot convert float NaN to integer for following: <pre class="prettyprint"><code>df = pandas.read_csv('zoom11.csv') df[['x']] = df[['x']].astype(int) </code></pre> <ul> <li>The "x" is obviously a column in the csv file, but I cannot spot any float NaN in the file, and dont get what does it mean by this. </li> <li>When I read the column as String, then it has values like -1,0,1,...2000, all look very nice int numbers to me.</li> <li>When I read the column as float, then this can be loaded. Then it shows values as -1.0,0.0 etc, still there are no any NaN-s</li> <li>I tried with error_bad_lines = False and dtype parameter in read_csv to no avail. It just cancels loading with same exception.</li> <li>The file is not small (10+ M rows), so cannot inspect it manually, when I extract a small header part, then there is no error, but it happens with full file. So it is something in the file, but cannot detect what.</li> <li>Logically the csv should not have missing values, but even if there is some garbage then I would be ok to skip the rows. Or at least identify them, but I do not see way to scan through file and report conversion errors.</li> </ul> Update: Using the hints in comments/answers I got my data clean with this: <pre class="prettyprint"><code># x contained NaN df = df[~df['x'].isnull()] # Y contained some other garbage, so null check was not enough df = df[df['y'].str.isnumeric()] # final conversion now worked df[['x']] = df[['x']].astype(int) df[['y']] = df[['y']].astype(int) </code></pre>

For identifying <code>NaN</code> values use <code>boolean indexing</code>: <pre class="prettyprint"><code>print(df[df['x'].isnull()]) </code></pre> Then for removing all non-numeric values use <code>to_numeric</code> with parameter <code>errors='coerce'</code> - to replace non-numeric values to <code>NaN</code>s: <pre class="prettyprint"><code>df['x'] = pd.to_numeric(df['x'], errors='coerce') </code></pre> And for remove all rows with <code>NaN</code>s in column <code>x</code> use <code>dropna</code>: <pre class="prettyprint"><code>df = df.dropna(subset=['x']) </code></pre> Last convert values to <code>int</code>s: <pre class="prettyprint"><code>df['x'] = df['x'].astype(int) </code></pre>

Pandas: ValueError: cannot convert float NaN to integer

Tags:

python

pandas

csv

I get ValueError: cannot convert float NaN to integer for following:

df = pandas.read_csv('zoom11.csv') df[['x']] = df[['x']].astype(int)

The "x" is obviously a column in the csv file, but I cannot spot any float NaN in the file, and dont get what does it mean by this.
When I read the column as String, then it has values like -1,0,1,...2000, all look very nice int numbers to me.
When I read the column as float, then this can be loaded. Then it shows values as -1.0,0.0 etc, still there are no any NaN-s
I tried with error_bad_lines = False and dtype parameter in read_csv to no avail. It just cancels loading with same exception.
The file is not small (10+ M rows), so cannot inspect it manually, when I extract a small header part, then there is no error, but it happens with full file. So it is something in the file, but cannot detect what.
Logically the csv should not have missing values, but even if there is some garbage then I would be ok to skip the rows. Or at least identify them, but I do not see way to scan through file and report conversion errors.

Update: Using the hints in comments/answers I got my data clean with this:

# x contained NaN df = df[~df['x'].isnull()]  # Y contained some other garbage, so null check was not enough df = df[df['y'].str.isnumeric()]  # final conversion now worked df[['x']] = df[['x']].astype(int) df[['y']] = df[['y']].astype(int)

542

asked Nov 16 '17 15:11

JaakL

1 Answers

For identifying NaN values use boolean indexing:

print(df[df['x'].isnull()])

Then for removing all non-numeric values use to_numeric with parameter errors='coerce' - to replace non-numeric values to NaNs:

df['x'] = pd.to_numeric(df['x'], errors='coerce')

And for remove all rows with NaNs in column x use dropna:

df = df.dropna(subset=['x'])

Last convert values to ints:

df['x'] = df['x'].astype(int)

149

answered Sep 20 '22 02:09

jezrael

Related questions
                            
                                How to change Python version of existing conda virtual environment?
                            
                                Difference between parsing a text file in r and rb mode
                            
                                Why is datetime.strptime not working in this simple example?
                            
                                In python, what is the difference between random.uniform() and random.random()?
                            
                                sklearn Logistic Regression "ValueError: Found array with dim 3. Estimator expected <= 2."
                            
                                Python Subprocess: Too Many Open Files
                            
                                Install opencv for Python 3.3
                            
                                Could not assemble any primary key columns for mapped table
                            
                                How to make a python script "pipeable" in bash?
                            
                                Unittest (sometimes) fails because floating-point imprecision
                            
                                How to force/ensure class attributes are a specific type?
                            
                                How can I suppress the newline after a print statement?
                            
                                Undo last Alembic migration
                            
                                Converting a list to a string [duplicate]
                            
                                How to get ipywidgets working in Jupyter Lab?
                            
                                App created with PyInstaller has a slow startup
                            
                                Python list comprehension - want to avoid repeated evaluation
                            
                                Why does Python 3 need dict.items to be wrapped with list()?
                            
                                Debugging Apache/Django/WSGI Bad Request (400) Error
                            
                                How to check if DynamoDB table exists?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With