I am new to spark. I need to construct a co-occurrence graph(In a tweet -words will become nodes and the if the words are from same tweet we add an edge between them) from streaming data like twitter tweets. Can we use spark streaming to construct a live co-occurrence twitter graph. Is spark streaming is meant for this use case?. I am not sure whether it can be done using spark streaming . If not what are the alternatives?
the co-occurrence frequency can be seen as a graph or an adjacency matrix, but this is a large sparse histogram (frequency count) in the product space of your word list. most likely you wish to detect a moving window correlation so should design a sketch data structure to track unusual increase or decrease in rate of occurrence in the stream. e.g. counting bloom filter or count min sketch applied to every word-pair - see http://twitter.github.io/algebird/#com.twitter.algebird.CMSCounting
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With