Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

constructing a graph from streaming data using spark streaming

I am new to spark. I need to construct a co-occurrence graph(In a tweet -words will become nodes and the if the words are from same tweet we add an edge between them) from streaming data like twitter tweets. Can we use spark streaming to construct a live co-occurrence twitter graph. Is spark streaming is meant for this use case?. I am not sure whether it can be done using spark streaming . If not what are the alternatives?

like image 879
Naren Avatar asked Jun 04 '15 04:06

Naren


1 Answers

the co-occurrence frequency can be seen as a graph or an adjacency matrix, but this is a large sparse histogram (frequency count) in the product space of your word list. most likely you wish to detect a moving window correlation so should design a sketch data structure to track unusual increase or decrease in rate of occurrence in the stream. e.g. counting bloom filter or count min sketch applied to every word-pair - see http://twitter.github.io/algebird/#com.twitter.algebird.CMSCounting

like image 177
jayprich Avatar answered Oct 06 '22 01:10

jayprich