To pass schema to a json file we do this:
from pyspark.sql.types import (StructField, StringType, StructType, IntegerType)
data_schema = [StructField('age', IntegerType(), True), StructField('name', StringType(), True)]
final_struc = StructType(fields = data_schema)
df =spark.read.json('people.json', schema=final_struc)
The above code works as expected. However now, I have data in table which I display by:
df = sqlContext.sql("SELECT * FROM people_json")
But if I try to pass a new schema to it by using following command it does not work.
df2 = spark.sql("SELECT * FROM people_json", schema=final_struc)
It gives the following error:
sql() got an unexpected keyword argument 'schema'
NOTE: I am using Databrics Community Edition
To create a PySpark DataFrame from an existing RDD, we will first create an RDD using the . parallelize() method and then convert it into a PySpark DataFrame using the . createDatFrame() method of SparkSession.
Using DataFrame.Select the columns from the original DataFrame and copy it to create a new DataFrame using copy() function. Yields below output. Alternatively, You can also use DataFrame. filter() method to create a copy and create a new DataFrame by selecting specific columns.
We can create a DataFrame programmatically using the following three steps. Create an RDD of Rows from an Original RDD. Create the schema represented by a StructType matching the structure of Rows in the RDD created in Step 1. Apply the schema to the RDD of Rows via createDataFrame method provided by SQLContext.
You cannot apply a new schema to already created dataframe. However, you can change the schema of each column by casting to another datatype as below.
df.withColumn("column_name", $"column_name".cast("new_datatype"))
If you need to apply a new schema, you need to convert to RDD and create a new dataframe again as below
df = sqlContext.sql("SELECT * FROM people_json")
val newDF = spark.createDataFrame(df.rdd, schema=schema)
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With