Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error saving a string longer than 1500 bytes in Datastore from Dataflow api

Dataflow job is throwing this error message when I try to save a very long string: The value of property "myProperty" is longer than 1500 bytes., code=INVALID_ARGUMENT.

There is an error when following Google's DatastoreWordCount sample and saving a string longuer then 1500 bytes.

I know that when using Datastore API, I am able to save strings that are longer than 1500 bytes by saving the property as com.google.appengine.api.datastore.Text. However, There is no alternative in DatastoreWordCount sample or in DatastoreHelper class documentation that could indicate that Text type is supported.

Could be a way to save such long strings using that api so that it could be read as com.google.appengine.api.datastore.Text?

The full error message is as follow:

java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: java.lang.RuntimeException: com.google.cloud.dataflow.sdk.util.UserCodeException: com.google.datastore.v1.client.DatastoreException: The value of property "dalekTestExecutions" is longer than 1500 bytes., code=INVALID_ARGUMENT
    at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:162)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:288)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnContext.outputWindowedValue(DoFnRunnerBase.java:284)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase$DoFnProcessContext$1.outputWindowedValue(DoFnRunnerBase.java:508)
    at com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowsAndCombineDoFn.closeWindow(GroupAlsoByWindowsAndCombineDoFn.java:205)
    at com.google.cloud.dataflow.sdk.util.GroupAlsoByWindowsAndCombineDoFn.processElement(GroupAlsoByWindowsAndCombineDoFn.java:192)
    at com.google.cloud.dataflow.sdk.util.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:49)
    at com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.processElement(DoFnRunnerBase.java:139)
    at com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:190)
    at com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.processElement(ForwardingParDoFn.java:42)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerLoggingParDoFn.processElement(DataflowWorkerLoggingParDoFn.java:47)
    at com.google.cloud.dataflow.sdk.util.common.worker.ParDoOperation.process(ParDoOperation.java:55)
    at com.google.cloud.dataflow.sdk.util.common.worker.OutputReceiver.process(OutputReceiver.java:52)
    at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:224)
    at com.google.cloud.dataflow.sdk.util.common.worker.ReadOperation.start(ReadOperation.java:185)
    at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:72)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:287)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:223)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:173)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:193)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:173)
    at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:160)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
like image 519
tanji Avatar asked May 04 '26 01:05

tanji


2 Answers

You can save a string longer than 1500 bytes by excluding the value from indexing:

Value longString = Value.newBuilder()
    .setStringValue(...)
    .setExcludeFromIndexes(true)
    .build();

If you need compatibility with App Engine's com.google.appengine.api.datastore.Text type, you would also want to set the meaning to 15:

Value longString = Value.newBuilder()
    .setStringValue(...)
    .setExcludeFromIndexes(true)
    .setMeaning(15)
    .build();
like image 141
Ed Davisson Avatar answered May 06 '26 14:05

Ed Davisson


DataStore create index for each property so there is a default limit of 1500 bytes on properties. Now if you need to store data something like big JSON then you can specify that index is not needed for this property in following way:

Entity newEntity =
                Entity.newBuilder(key)
                        .set("time", Timestamp.parseTimestamp("1970-01-01T00:00:00Z"))
                        .set("message", StringValue.newBuilder(JSON).setExcludeFromIndexes(true).build())
                        .build();

This way you will be able to save data of bigger size rather than default limit of 1500 bytes.

like image 40
Vishal Garg Avatar answered May 06 '26 13:05

Vishal Garg



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!