Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google app engine datastore tag cloud with python

We have some unstructured textual data in our app engine datastore. I wanted to create a 'one off' tag cloud of one property on a subset of the datastore objects. After a look around, I can't see any framework that will allow me to do this without writing it myself.

The way I had in mind was:

  • Write a map (as in map reduce) function to go over every object of the particular type in a datastore,
  • Split the text string into words
  • For each word increment a counter
  • Use the final counts to generate the tag cloud with some third party software (offline - any suggestions here welcome)

As I've never done this before, I was wandering if firstly there is some framework around that does this for me (please) of if not am I approaching it in the right way. i.e please feel free to point out gaping holes in the plan.

like image 839
probably at the beach Avatar asked Mar 07 '11 12:03

probably at the beach


1 Answers

Feed TagCloud and PyTagCloud are two possibilities.

  • Feed TagCloud Generator Gadget for Google App Engine might fit your needs. Unfortunately, it's undocumented. Fortunately it's rather simple, though I'm not sure how well-suited it is to your needs.

    It operates on a feed, and appears to be somewhat flexible, so if you have an feed of your site, it might not be too much trouble to integrate, though all processing will be online.

  • PyTagCloud is also worth a look. You'll be able to do the processing offline, and it generates rather handsome clouds.

    All you'll have to do to get this working, is export your datastore; the counts and splitting will be done for you, as PyTagCloud can operate on text files. Following the instructions in the App Engine docs about Uploading and Downloading Data will show you how to export the datastore to your local machine. You'll want to write an "Exporter Class", and have PyTagCloud operate on the output.


If you decide to roll your own, you probably want to skip the online processing and use the offline method of Uploading and Downloading Data above, unless you want a dynamically-updated cloud. Iterating over your entire data store, and doing online counts is the most annoying and expensive part of the task. It only makes sense to do this if you want or need a dynamic tag-cloud. As above, I'd recommend writing an "Exporter Class", and operating on that locally.

like image 134
Ezra Avatar answered Sep 25 '22 16:09

Ezra