Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to query datastore from dataflow/beam in python

Looks like google has released support for querying the datastore from dataflow/beam in python. I'm trying to get it to run locally but I'm running into some issues:

import apache_beam as beam
from apache_beam.io.datastore.v1.datastoreio import ReadFromDatastore
from gcloud import datastore

client = datastore.Client('my-project')
query = client.query(kind='Document')

options = get_options()
p = beam.Pipeline(options=options)

entities = p | 'read' >> ReadFromDatastore(project='my-project', query=query)
entities | 'write' >> beam.io.Write(beam.io.TextFileSink('gs://output.txt'))

p.run()

This is giving me a

AttributeError: 'Query' object has no attribute 'HasField' [while running 'read/Split Query']

I'm guessing that I'm passing in the wrong query object (there are 3-4 pip packages that you can import datastore from) but I can't figure out which one I'm supposed to pass in. In the tests they are passing in protobuf. Is that what I have to use? Can anyone show a simple example query using protobuf if that's what I have to do?

like image 298
Bovard Avatar asked Jun 01 '26 21:06

Bovard


1 Answers

The wordcount example uses protobufs for the query.

Looks like you need something like:

from google.datastore.v1 import query_pb2
...
query = query_pb2.Query()
query.kind.add().name = 'Document'
like image 68
jkff Avatar answered Jun 03 '26 13:06

jkff



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!