Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pyspark Datetype() during creation of the dataframe

here is my source code in a databricks notebook using python

data = [('2021-01-01','2021-01-02')]   
schema1 = StructType([
StructField("date1", DateType(), True),
StructField("date2", DateType(), True)])
spark.createDataFrame(data,schema1).show()

however I got the following error enter image description here

anyone has the idea ?

like image 978
mytabi Avatar asked Jan 20 '26 06:01

mytabi


1 Answers

You tried to inject string type data into date type so you failed.

I see two solutions:

  1. Use date type data
import datetime

data = [(
    datetime.datetime.strptime('2021-01-01', "%Y-%m-%d").date(),
    datetime.datetime.strptime('2021-01-02', "%Y-%m-%d").date()
)]   

schema1 = StructType([
StructField("date1", DateType(), True),
StructField("date2", DateType(), True)])

df = spark.createDataFrame(data, schema1)

df.show()

# output:
+----------+----------+
|     date1|     date2|
+----------+----------+
|2021-01-01|2021-01-02|
+----------+----------+
  1. Don't use schema at first, convert into date type later
from pyspark.sql import functions as F

data = [('2021-01-01','2021-01-02')] 
df = spark.createDataFrame(data)
df = df.select(*(F.to_date(c) for c in df.columns))

df.show()

# oudput
+-----------+-----------+
|to_date(_1)|to_date(_2)|
+-----------+-----------+
| 2021-01-01| 2021-01-02|
+-----------+-----------+
like image 153
Pav3k Avatar answered Jan 23 '26 21:01

Pav3k