Spark JoinWithCassandraTable on TimeStamp partition key STUCK

Tags:

I'm trying to filter on a small part of a huge C* table by using:

    val snapshotsFiltered = sc.parallelize(startDate to endDate).map(TableKey(_)).joinWithCassandraTable("listener","snapshots_tspark")

    println("Done Join")
    //*******
    //get only the snapshots and create rdd temp table
    val jsons = snapshotsFiltered.map(_._2.getString("snapshot"))
    val jsonSchemaRDD = sqlContext.jsonRDD(jsons)
    jsonSchemaRDD.registerTempTable("snapshots_json")

With:

    case class TableKey(created: Long) //(created, imei, when)--> created = partititon key | imei, when = clustering key

And the cassandra table schema is:

CREATE TABLE listener.snapshots_tspark (
created timestamp,
imei text,
when timestamp,
snapshot text,
PRIMARY KEY (created, imei, when) ) WITH CLUSTERING ORDER BY (imei ASC, when ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
AND comment = ''
AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 0
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';

The problem is that the process freezes after the println done with no errors on spark master ui.

[Stage 0:>                                                                                                                                (0 + 2) / 2]

Won`t the Join work with timestamp as the partition key? Why it freezes?

543

asked Oct 25 '15 12:10

Reshef

1 Answers

By using:

sc.parallelize(startDate to endDate)

With the startData and endDate as Longs generated from Dates by the format:

("yyyy-MM-dd HH:mm:ss")

I made spark to build a huge array (100,000+ objects) to join with C* table and it didn't stuck at all- C* worked hard to make the join happen and return the data.

Finally, I changed my range to:

case class TableKey(created_dh: String)
val data = Array("2015-10-29 12:00:00", "2015-10-29 13:00:00", "2015-10-29 14:00:00", "2015-10-29 15:00:00")
val snapshotsFiltered = sc.parallelize(data, 2).map(TableKey(_)).joinWithCassandraTable("listener","snapshots_tnew")

And it is ok now.

163

answered Sep 22 '22 16:09

Reshef

Related questions
                            
                                Best way for instant messaging system without Websockets [closed]
                            
                                NodeJS MySQL How to get the result outside of the query function
                            
                                MySQL update join performance
                            
                                Constantly getting 'Too many connections' in Django 1.4.20
                            
                                How to see if result of SQL query is empty before performing other queries in PHP
                            
                                How to FULL OUTER JOIN multiple tables in MySQL
                            
                                Parse SQL Script to extract table and column names
                            
                                Does order matter in MySQL for short circuiting of predicates?
                            
                                Better way to insert blob into MySQL with PHP
                            
                                MYSQL: To kill process using sys_exec()
                            
                                Displaying markers on google map from mysql database using php/javascript
                            
                                MySQL ORDER BY random field does not work with additional operation
                            
                                Hibernate - Is there a way to join 2 columns against 1?
                            
                                Sequelize belongsToMany not creating new table
                            
                                Copy column definition from one table to another?
                            
                                Get the top n results per group [duplicate]
                            
                                PHP serialized data stored in mysql db error
                            
                                PHP randomly decrements large integers by 1 [duplicate]
                            
                                Why predicate locks cannot be acquired through an explicit locking query syntax
                            
                                Getting broken pipe when passing mysql connection to a python thread

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark JoinWithCassandraTable on TimeStamp partition key STUCK

Tags:

mysql

scala

cassandra

apache-spark

datastax-enterprise

Reshef

People also ask

1 Answers

Reshef

Recent Activity

Donate For Us