I have a python code which specifies the schema and then creates an empty dataframe. This python code used to work fine in earlier versions of Pandas and Numpy. However, with the latest version, it fails.
Here is the code:
import pandas as pd
import numpy as np
schema = {'timestamp': np.datetime64, 'instrument_token': int, 'last_price': float, 'volume': int}
data = pd.DataFrame(columns=schema.keys()).astype(schema)
It throws the following error:
TypeError: Casting to unit-less dtype 'datetime64' is not supported. Pass e.g. 'datetime64[ns]' instead.
I would appreciate if you can help resolve this.
thanks and regards,
In case anyone is coming here with this error in PySpark:
pyspark 3.4.1
pandas 2.1.0
The schema that was causing this error has a StructField
with DataType
: TimestampType
when I was trying to convert a DataFrame
with toPandas()
.
I followed this troubleshooting article: https://docs.tecton.ai/docs/beta/tips-and-tricks/troubleshooting/conversion-from-pyspark-dataframe-to-pandas-dataframe-with-pandas-2-0
and enabled Arrow Conversion on my test spark session.
# spark: SparkSession
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
You have to specify the unit for internal storage ('ns', 'ms', 's', ...):
import pandas as pd
import numpy as np
# HERE --v
schema = {'timestamp': 'datetime64[ns]', 'instrument_token': int, 'last_price': float, 'volume': int}
data = pd.DataFrame(columns=schema.keys()).astype(schema)
Output:
>>> data.dtypes
timestamp datetime64[ns]
instrument_token int64
last_price float64
volume int64
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With