How to refresh a table and do it concurrently?

Name: Auto Refresh PivotTables & Queries - without VBA!
Uploaded: 2022-09-13 05:13:02
Description: How to refresh a table and do it concurrently?I'm using Spark Streaming 2.1. I'd like to refresh some cached table

Question

I'm using Spark Streaming 2.1. I'd like to refresh some cached table (loaded by spark provided DataSource like parquet, MySQL or user-defined data sources) periodically.

how to refresh the table?

Suppose I have some table loaded by

spark.read.format("").load().createTempView("my_table")

and it is also cached by

spark.sql("cache table my_table")

is it enough with following code to refresh the table, and when the table is loaded next, it will automatically be cached

spark.sql("refresh table my_table")

or do I have to do that manually with

spark.table("my_table").unpersist spark.read.format("").load().createOrReplaceTempView("my_table") spark.sql("cache table my_table")
is it safe to refresh the table concurrently?

By concurrent I mean using ScheduledThreadPoolExecutor to do the refresh work apart from the main thread.

What will happen if the Spark is using the cached table when I call refresh on the table?

Ganesh · Accepted Answer

In Spark 2.2.0 they have introduced feature of refreshing the metadata of a table if it was updated by hive or some external tools.

You can achieve it by using the API,

spark.catalog.refreshTable("my_table")

This API will update the metadata for that table to keep it consistent.

How to refresh a table and do it concurrently?

Tags:

apache-spark

apache-spark-sql

spark-streaming

宇宙人

Video Answer

1 Answers

Ganesh

Recent Activity

Donate For Us

How to refresh a table and do it concurrently?

Tags:

apache-spark

apache-spark-sql

spark-streaming

宇宙人

Video Answer

1 Answers

Ganesh

Related questions

Recent Activity

Donate For Us