I came across the following problem:
I have this file which is structured as a jsonlines
file:
{"id": 1, "uuid": "1344800117571260417"}
{"id": 2, "uuid": "1344900117571260918"}
If I try to read it with Pandas like this:
df = pd.read_json('file.jsonl', orient='records', lines=True)
I get the following DataFrame
:
id uuid
0 1 1344800117571260416
1 2 1344900117571260928
But the uuid
has different values, I am thinking of some overflow happening here, but I am not sure. The type inferred by pandas
for that column is int64
, but np.iinfo(np.int64).max
is 9223372036854775807
, which is way higher than the values from the uuid
column.
An immediate solution to this problem is to disable inferring the types, like pd.read_json(..., dtype=False)
, but I am curious about this unexpected behavior, does anyone know why this is happening?
BTW, I am using pandas
version 1.0.1
and python
version 3.7.6
.
As posted in the comments, pandas
does int(float(x))
, which is the reason of the bug. I filed a ticket to report the bug, you can check it out here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With