Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: cannot safely convert passed user dtype of int32 for float64

I am stumped by a problem with loading my data into a Pandas dataframe using read_table(). The error says TypeError: Cannot cast array from dtype('float64') to dtype('int32') according to the rule 'safe' and ValueError: cannot safely convert passed user dtype of int32 for float64 dtyped data in column 2

test.py:

import numpy as np
import os
import pandas as pd

# put test.csv in same folder as script
mydir = os.path.dirname(os.path.abspath(__file__))
csv_path = os.path.join(mydir, "test.csv")

df = pd.read_table(csv_path, sep=' ',
                   comment='#',
                   header=None,
                   skip_blank_lines=True,
                   names=["A", "B", "C", "D", "E", "F", "G"],
                   dtype={"A": np.int32,
                       "B": np.int32,
                       "C": np.float64,
                       "D": np.float64,
                       "E": np.float64,
                       "F": np.float64,
                       "G": np.int32})

test.csv:

2270433 3 21322.889 11924.667 5228.753 1.0 -1 2270432 3 21322.297 11924.667 5228.605 1.0 2270433

like image 961
crypdick Avatar asked Sep 16 '25 10:09

crypdick


2 Answers

The problem was that I was using spaces as the delimiter and that the csv had trailing spaces. Removing the trailing spaces solved the issue.

To trim all of the trailing spaces on every line of every file in a directory, I ran this command: find . -name "*.csv" | xargs sed -i 's/[ \t]*$//'

like image 113
crypdick Avatar answered Sep 19 '25 05:09

crypdick


Column 2 includes other types of symbols, e.g. float instead of int.

I changed the dtype to float instead of integer and it got fixed.

like image 42
jlwu Avatar answered Sep 19 '25 07:09

jlwu