Pandas changing values when inferring dtypes

Question

I came across the following problem:

I have this file which is structured as a jsonlines file:

{"id": 1, "uuid": "1344800117571260417"}
{"id": 2, "uuid": "1344900117571260918"}

If I try to read it with Pandas like this:

df = pd.read_json('file.jsonl', orient='records', lines=True)

I get the following DataFrame:

   id                 uuid
0   1  1344800117571260416
1   2  1344900117571260928

But the uuid has different values, I am thinking of some overflow happening here, but I am not sure. The type inferred by pandas for that column is int64, but np.iinfo(np.int64).max is 9223372036854775807, which is way higher than the values from the uuid column.

An immediate solution to this problem is to disable inferring the types, like pd.read_json(..., dtype=False), but I am curious about this unexpected behavior, does anyone know why this is happening?

BTW, I am using pandas version 1.0.1 and python version 3.7.6.

Giovanni Rescia · Accepted Answer

As posted in the comments, pandas does int(float(x)), which is the reason of the bug. I filed a ticket to report the bug, you can check it out here.

Pandas changing values when inferring dtypes

Tags:

python

pandas

Giovanni Rescia

1 Answers

Giovanni Rescia

Recent Activity

Donate For Us

Pandas changing values when inferring dtypes

Tags:

python

pandas

Giovanni Rescia

1 Answers

Giovanni Rescia

Related questions

Recent Activity

Donate For Us