Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Import CSV into google cloud datastore

I have a CSV file with 2 columns and 20,000 rows I would like to import into Google Cloud Datastore. I'm new to the Google Cloud and NoSQL databases. I have tried using dataflow but need to provide a Javascript UDF function name. Does anyone have an example of this? I will be querying this data once it's in the datastore. Any advice or guidance on how to create this would be appreciated.

like image 561
IamSule Avatar asked Mar 07 '23 01:03

IamSule


1 Answers

Using Apache Beam, you can read a CSV file using the TextIO class. See the TextIO documentation.

Pipeline p = Pipeline.create();

p.apply(TextIO.read().from("gs://path/to/file.csv"));

Next, apply a transform that will parse each row in the CSV file and return an Entity object. Depending on how you want to store each row, construct the appropriate Entity object. This page has an example of how to create an Entity object.

.apply(ParDo.of(new DoFn<String, Entity>() {
    @ProcessElement
    public void processElement(ProcessContext c) {
        String row = c.element();
        // TODO: parse row (split) and construct Entity object
        Entity entity = ...
        c.output(entity);
    }
}));

Lastly, write the Entity objects to Cloud Datastore. See the DatastoreIO documentation.

.apply(DatastoreIO.v1().write().withProjectId(projectId));
like image 83
Andrew Nguonly Avatar answered Mar 17 '23 04:03

Andrew Nguonly