Spark persist temp view

Tags:

I'm trying to persist a temp view with the purpose of querying it again via sql:

val df = spark.sqlContext.read.option("header", true).csv("xxx.csv")
df.createOrReplaceTempView("xxx")

persist/cache:

Click to copy

df.cache()                          // or
spark.sqlContext.cacheTable("xxx")  // or
df.persist(MEMORY_AND_DISK)         // or
spark.sql("CACHE TABLE xxx")

Then I move the underlying xxx.csv, and:

Click to copy

spark.sql("select * from xxx")

Upon which, I find that only CACHE TABLE xxx stores a copy. What am I doing wrong, how can persist eg. DISK_ONLY a queryable view/table?

401

asked May 18 '17 11:05

darnok

Video Answer

1 Answers

First cache it, as df.cache, then register as df.createOrReplaceTempView("dfTEMP"), so now every time you will query dfTEMP such as val df1 = spark.sql("select * from dfTEMP) you will read it from memory (1st action on df1 will actually cache it), do not worry about persistence for now as if df does not fit into memory, i will spill the rest to disk.

110

answered Sep 29 '22 07:09

elcomendante

Related questions
                            
                                How can you merge two tables without losing any of the rows in SQL?
                            
                                T-SQL - Filtered dates between a range
                            
                                mysql select closest date from today
                            
                                convert array_agg output to array and not string
                            
                                SQL: Preparation of SQL query fails. But is executed when done manually
                            
                                msSql hex to base64
                            
                                Remove last comma from dynamic sql
                            
                                Does SSRS Report Builder always default date to MM/dd/yyyy?
                            
                                PATINDEX with letter range exclude diacritics (accented characters)
                            
                                How to get the all table(s) name which are used in particular stored procedure?
                            
                                How to select till the sum reach some value
                            
                                Connecting to the Integration Services on "srvrname" failed : "The specified service does not exist as an installed service."
                            
                                How to filter after group by and aggregate in Spark dataframe?
                            
                                How to define own aggregate function in Mysql for GROUP BY?
                            
                                Search for all combinations of first, middle and last name
                            
                                SQL query for top 100, displayed in reverse
                            
                                Replace null values with just a blank
                            
                                SQL - CTE vs VIEW
                            
                                ResultSet: Exception: set type is TYPE_FORWARD_ONLY -- why?
                            
                                How to insert data into table using stored procedures in postgresql

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark persist temp view

Tags:

sql

persist

scala

apache-spark

darnok

People also ask

Video Answer

1 Answers

elcomendante

Recent Activity

Donate For Us