Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to design app engine datastore and text search modelling

We have a Java application running on google app engine. Having a kind called Contact. following is the sample schema

Contact
{
  long id
  String firstName
  String lastName
  ...
}

The above is the existig model,for supporting few requirements we are storing this object both in datastore and text search

Now we want to integrate contacts with their page views data.

Each contact can have thousands of page views records or even millions for some contacts

Following is the sample page visit object [Note : We don't have this object as of now, this is just give information about page visit]

PageVisit
{

  long id
  String url
  String refUrl
  int  country
  String city
  ....
}

We have a requirement , which needs a query on contact core properties and his page visited data

for ex :

select * from Contact where firstName = 'abc' and url = 'cccccc.com';
select * from Contact where firstName = 'abc' or url = 'cccccc.com';

To write this kind of queries we need both contact core properties and their page visited need to available in Contact object itself but contact can have huge number page views. So this will cross entity maximum size limit

So how to design contact model in this kind of situation both in datastore and text search.

Thanks

like image 423
Rams Avatar asked Nov 08 '22 13:11

Rams


1 Answers

Cloud Datastore doesn't support joins, so you will need to handle this in some manner from the client code.

2 possible ways to handle this are:

Denormalize the Contact you need to search into PageVisit:

PageVisit
{

  long id
  String firstName // Denormalized from Contact
  String url
  String refUrl
  int  country
  String city
  ....
}

This requires you to create a composite index:

- kind: PageVisit
  ancestor: no
  properties:
  - name: firstName
  - name: url

Or run multiple queries

select id from Contact where firstName = 'abc'

select * from PageVisit where contactId={id} and url = 'cccccc.com';
select * from PageVisit where contactId={id} or url = 'cccccc.com';

This requires you to create a composite index:

- kind: PageVisit
  ancestor: no
  properties:
  - name: contactId
  - name: url

Final aside: Depending on how large your site is, it might be worth looking into Cloud Bigtable for the PageView data. It's a better solution for high write OLAP-style workloads.

like image 190
Dan McGrath Avatar answered Nov 15 '22 08:11

Dan McGrath