I came across the following problem:
I have this file which is structured as a jsonlines file:
{"id": 1, "uuid": "1344800117571260417"}
{"id": 2, "uuid": "1344900117571260918"}
If I try to read it with Pandas like this:
df = pd.read_json('file.jsonl', orient='records', lines=True)
I get the following DataFrame:
id uuid
0 1 1344800117571260416
1 2 1344900117571260928
But the uuid has different values, I am thinking of some overflow happening here, but I am not sure. The type inferred by pandas for that column is int64, but np.iinfo(np.int64).max is 9223372036854775807, which is way higher than the values from the uuid column.
An immediate solution to this problem is to disable inferring the types, like pd.read_json(..., dtype=False), but I am curious about this unexpected behavior, does anyone know why this is happening?
BTW, I am using pandas version 1.0.1 and python version 3.7.6.
As posted in the comments, pandas does int(float(x)), which is the reason of the bug. I filed a ticket to report the bug, you can check it out here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With