Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pyspark - saveAsTable - How to Insert new data to existing table?

How to Insert new data to existing table???

I'm trying to insert new data to existing table using pyspark.

This is my program

from pyspark import SparkContext
from pyspark.sql import SQLContext, DataFrameWriter

sc = SparkContext("local[*]", "SPARK-SQL")
sqlContext = SQLContext(sc)

df = sqlContext.read.json("people.json")
df.registerTempTable("people")

# Show old data
result = sqlContext.sql("SELECT * from people")
result.show()

# Create new data
new_data = [{"name": "Phan", "age": 22}]
df_new_data = sqlContext.createDataFrame(new_data)
# Save data to table 'people'
df_new_data.write.mode("append").saveAsTable("people")

# Show new data
result = sqlContext.sql("SELECT * from people")
result.show()

After I run it. Data in table "people" can not be changed.

Old data
+---+--------+
|age|    name|
+---+--------+
| 30| Michael|
| 30|    Andy|
| 19|  Justin|
| 21|PhanHien|
+---+--------+
New data
+---+--------+                                                                  
|age|    name|
+---+--------+
| 30| Michael|
| 30|    Andy|
| 19|  Justin|
| 21|PhanHien|
+---+--------+

Please help me to change data in table!!! Thank you!

like image 291
phan hien Avatar asked Nov 15 '25 11:11

phan hien


2 Answers

I tried to saveAsTable with a table name does not exist.

df_new_data.write.mode("append").saveAsTable("people1")

# Show new data
result = sqlContext.sql("SELECT * from people1")
result.show()

It worked. And I can see new data in table "people1"

+---+----+
|age|name|
+---+----+
|22 |Phan|
+---+----+
like image 63
phan hien Avatar answered Nov 18 '25 20:11

phan hien


>>> df_new_data.write.mode("append").saveAsTable("people")

The above code writes people table in default database in hive.

So if you want to see the data from hive table you need to create HiveContext then view results from hive table instead of temporary table.

>>> hc=HiveContext(sc)
>>> hc.sql("select * from default.people").show(100,False)

UPDATE:

Append new data to temporary table:

>>> df1=df
>>> df2=df.unionAll(df1)
>>> df2.registerTempTable("people")
>>> sqlContext.sql("select * from people").show(100,False)
like image 23
notNull Avatar answered Nov 18 '25 19:11

notNull



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!