Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform update in Apache Spark SQL

I have to update a JavaSchemaRDD with some new values by having some WHERE conditions.

This is the SQL query which I want to convert into Spark SQL:

UPDATE t1
  SET t1.column1 = '0', t1.column2 = 1, t1.column3 = 1    
  FROM TABLE1 t1
  INNER JOIN TABLE2 t2 ON t1.id_column = t2.id_column     
  WHERE (t2.column1 = 'A') AND (t2.column2 > 0)   
like image 394
Shekar Patel Avatar asked Nov 10 '22 18:11

Shekar Patel


1 Answers

Yup I got solution my self. I have achieved this using Spark core only, I have not used Spark-Sql for this. I have 2 RDD's (also can be called as tables or datasets) t1 and t2. If we observe my query in the question I am updating t1 based on one join condition and two where conditions. Meaning I need three columns(id_column, column1 and column2) from t2. So I have taken these columns in to 3 individual collections. And then I put an iteration over 1st RDD t1 and during the iteration I have added those three condition statements(1 Join and 2 where conditions) using java "if" conditions. So based on "if" conditions result first RDD values got updated.

like image 64
Shekar Patel Avatar answered Nov 15 '22 10:11

Shekar Patel