Dask read_csv-- Mismatched dtypes found in `pd.read_csv`/`pd.read_table`

Tags:

I'm trying to use dask to read csv file, and it gave me an error like below. But the thing is I want my ARTICLE_ID be object(string). Anyone can help me to read data successfully?

Traceback is like below:

ValueError: Mismatched dtypes found in `pd.read_csv`/`pd.read_table`.

+------------+--------+----------+

| Column     | Found  | Expected |

+------------+--------+----------+

| ARTICLE_ID | object | int64    |

+------------+--------+----------+

The following columns also raised exceptions on conversion:

ARTICLE_ID:


ValueError("invalid literal for int() with base 10: ' July 2007 and 31 March 2008. Diagnostic practices of the medical practitioners for establishing the diagnosis of different types of EPTB were studied. Results: For the diagnosi\\\\'",)

Usually this is due to dask's dtype inference failing, and
*may* be fixed by specifying dtypes manually by adding:

dtype={'ARTICLE_ID': 'object'}

to the call to `read_csv`/`read_table`.

653

asked Sep 24 '18 20:09

2 Answers

The message is suggesting that your change your call from

df = dd.read_csv('mylocation.csv', ...)

df = dd.read_csv('mylocation.csv', ..., dtype={'ARTICLE_ID': 'object'})

where you should change the file location and any other arguments to what you were using before. If this still doesn't work, then please update your question.

answered Nov 07 '22 18:11

You can use sample parameter in read_csv method and assign it an integer to indicate the number of bytes to use when determining dtypes. For example, I had to give it 25000000 to correctly infer the types of my data in the shape of (171907, 161).

df = dd.read_csv("game_logs.csv", sample=25000000)

https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.read_csv

answered Nov 07 '22 16:11

gench

Related questions
                            
                                Format x-axis on chart created with pandas plot method
                            
                                Convert JSON API response to pandas Dataframe
                            
                                Python class variables or @property
                            
                                How can I convert bytes object to decimal or binary representation in python?
                            
                                How to grab all headers from a website using BeautifulSoup?
                            
                                How to use airflow with Celery
                            
                                How to implement a smooth clamp function in python?
                            
                                Flask upload: How to get file name?
                            
                                How to generate a sphere in 3D Numpy array
                            
                                Python3 print() Vs Python2 print
                            
                                FutureWarning: specifying 'categories' or 'ordered' in .astype() is deprecated; pass a CategoricalDtype instead
                            
                                No module named 'websocket'
                            
                                Execute Bash commands Python way [duplicate]
                            
                                XGBOOST: sample_Weights vs scale_pos_weight
                            
                                Issues listening incoming messages in websocket client on Python 3.6
                            
                                Higher order gradients in pytorch
                            
                                how to create a autocomplete input field in a form using Django
                            
                                How to unpack the columns of a pandas DataFrame to multiple variables
                            
                                numpy NaN not always recognized
                            
                                Google Cloud Function - Function load error: File main.py that is expected to define function doesn't exist

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Dask read_csv-- Mismatched dtypes found in `pd.read_csv`/`pd.read_table`

Tags:

python

dataframe

dask

Coffey Liu

People also ask

2 Answers

mdurant

gench

Recent Activity

Donate For Us