I have a CSV file with 2 columns and 20,000 rows I would like to import into Google Cloud Datastore. I'm new to the Google Cloud and NoSQL databases. I have tried using dataflow but need to provide a Javascript UDF function name. Does anyone have an example of this? I will be querying this data once it's in the datastore. Any advice or guidance on how to create this would be appreciated.
Using Apache Beam, you can read a CSV file using the TextIO
class. See the TextIO documentation.
Pipeline p = Pipeline.create();
p.apply(TextIO.read().from("gs://path/to/file.csv"));
Next, apply a transform that will parse each row in the CSV file and return an Entity
object. Depending on how you want to store each row, construct the appropriate Entity
object. This page has an example of how to create an Entity
object.
.apply(ParDo.of(new DoFn<String, Entity>() {
@ProcessElement
public void processElement(ProcessContext c) {
String row = c.element();
// TODO: parse row (split) and construct Entity object
Entity entity = ...
c.output(entity);
}
}));
Lastly, write the Entity
objects to Cloud Datastore. See the DatastoreIO documentation.
.apply(DatastoreIO.v1().write().withProjectId(projectId));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With