How to delete rows in a table created from a Spark dataframe?

Question

Basically, I would like to do a simple delete using SQL statements but when I execute the sql script it throws me the following error:

pyspark.sql.utils.ParseException: u" missing 'FROM' at 'a'(line 2, pos 23) == SQL == DELETE a.* FROM adsquare a -----------------------^^^ "

These is the script that I'm using:

sq = SparkSession.builder.config('spark.rpc.message.maxSize','1536').config("spark.sql.shuffle.partitions",str(shuffle_value)).getOrCreate()
adsquare = sq.read.csv(f, schema=adsquareSchemaDevice , sep=";", header=True)
adsquare_grid = adsqaureJoined.select("userid", "latitude", "longitude").repartition(1000).cache()
adsquare_grid.createOrReplaceTempView("adsquare")   

sql = """
    DELETE a.* FROM adsquare a
    INNER JOIN codepoint c ON a.grid_id = c.grid_explode
    WHERE dis2 > 1 """

sq.sql(sql)

Note: The codepoint table is created during the execution.

Is there any other way I can delete the rows with the above conditions?

koiralo · Accepted Answer

Dataframes in Apache Spark are immutable. SO you cannot change it, to delete rows from data frame you can filter the row that you do not want and save in another dataframe.

Manish Saraf Bhardwaj · Answer

You can not delete rows from Data Frame. But you can create new Data Frame which exclude unwanted records.

sql = """
    Select a.* FROM adsquare a
    INNER JOIN codepoint c ON a.grid_id = c.grid_explode
    WHERE dis2 <= 1 """

sq.sql(sql)

In this way you can create new data frame. Here I used reverse condition dis2 <= 1

Souvik · Answer

You can not delete rows from Data Frame because Hadoop follow WORM( write once read many times) instead you can filter out the deleted records in the SQL statement will give you the new data frame.

How to delete rows in a table created from a Spark dataframe?

Tags:

apache-spark

apache-spark-sql

pyspark

ebertbm

Video Answer

3 Answers

koiralo

Manish Saraf Bhardwaj

Souvik

Recent Activity

Donate For Us

How to delete rows in a table created from a Spark dataframe?

Tags:

apache-spark

apache-spark-sql

pyspark

ebertbm

Video Answer

3 Answers

koiralo

Manish Saraf Bhardwaj

Souvik

Related questions

Recent Activity

Donate For Us