Could you please help me to figure out what happens while initializing Spark RDD?
There is an official example here:
val capitals = spark.read.parquet("capitals.parquet").select("name", "country")
val luceneRDD = LuceneRDD(capitals)
val result = luceneRDD.termQuery("name", "ottawa", 10)
But I'm not familiar with Scala and have troubles with reading source-code. Could you pls answer next questions:
capitals.parquet? How can I index each row of each column (all values)?luceneRDD?(disclaimer: I am the author of LuceneRDD)
Take a look at the slides that I have prepared:
https://www.slideshare.net/zouzias/lucenerdd-for-geospatial-search-and-entity-linkage
In a nutshell, LuceneRDD instantiates an inverted index on each Spark executor and collects / aggregates search results from Spark executors to the Spark driver. The main motivation behind LuceneRDD is to natively extend Spark's capabilities with full-text search, geospatial search and entity linkage without requiring an external dependency of a SolrCloud or Elasticsearch cluster.
To answer your questions:
LuceneRDD(capitals.repartition(numPartitions=10))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With