How I would go about building a DataStoreInputReader that is based off a query (instead of reading every single entity of that type). The rationale being to minimize reads (since the query is indexed to a subset), and the processing time.
First, is this a good idea? Or would there be actual time and processing savings in using a query-backed custom datastoreinputreader or would the query itself cancel mapreduce parallelism or add other overhead?
Second, how to do it? I have been reading the *input_readers.py* and it's not clear how to subclass the AbstractDataStoreInputReader to do this. Perhaps someone can explain the process for implementing something like this, as it's not clear from reading the code (and documentation is outdated or inexistent).
Brownie points for those who can point to working code (github or others) that show custom datastoreinputreader implementations.
This would be huge in making AppEngine MapReduce more developer accessible or friendly ;-)
http://code.google.com/p/appengine-mapreduce/source/browse/trunk/python/src/mapreduce/input_readers.py DatastoreInputReader did support filters now!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With