I am relatively new to pandas and while trying to define dtypes to read a large file, I am getting the following error: NameError: name 'int64' is not defined
.
I made sure that pandas and numpy are installed and updated, but from what I understand this is a python error. I've gone through a few tutorials where nobody had this problem. See code below returning the error:
import pandas as pd
import numpy as np
data = pd.read_csv("file.csv", encoding="utf-16le", dtype={
"time": int64,
"created_date_sk": int64,
"eventType": object,
"itemId": int64,
"fieldId": int64,
"userId": int64
})
data.head()
Full trace:
Traceback (most recent call last): File "manipulate.py", line 5, in module "time": int64, NameError: name 'int64' is not defined
I would expect the int64
type to be recognized, but it only seems to be able to read the int type. The object type seems to work.
The Python "NameError: function is not defined" occurs when we try to call a function that is not declared or before it is declared. To solve the error, make sure you haven't misspelled the function's name and call it after it has been declared.
What Is a NameError in Python? In Python, the NameError occurs when you try to use a variable, function, or module that doesn't exist or wasn't used in a valid way. Some of the common mistakes that cause this error are: Using a variable or function name that is yet to be defined.
The interpreter tells you that it is not recognized because int64 belongs to numpy.
Change your code to this (it complains about no file.csv in my filesystem, but this is normal):
import pandas as pd
import numpy as np
data = pd.read_csv("file.csv", encoding="utf-16le", dtype={
"time": np.int64,
"created_date_sk": np.int64,
"eventType": object,
"itemId": np.int64,
"fieldId": np.int64,
"userId": np.int64
})
data.head()
Or better yet, import it at the beginning:
from numpy import int64
The reason you are getting this error is because int64
is not defined in the local python namespace. therefore using it in the dictionary throws an error. There are a couple of things you can do to fix this.
Option 1: Use strings
The simplest option is to enclose your datatypes inside strings. simply change int64
to "int64"
inside your dtype dictionary.
Option 2: Use numpy
Change int64
to np.int64
. (note this would require you importing the numpy
package.
I like option2.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With