Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How exactly do Google App Engine Logs Work?

Where does Google store the logs when you do a Logging statement? Logging statements seem to be pretty fast, so it doesn't seem like they are stored in the datastore.

How reliable are the logs? If I do a logging statement and it succeeds, is it pretty much guaranteed that it will show up in the logs?

How much past history of logs is stored?

The reason I'm interested in this is because I'm making a question and answer website, and I want to keep track of views by each unique logged in user to each question, and display the view count on the question page. So if 10 different users visit the question page 100 times, it still only counts as 10 unique views.

I have an offsite computer that does background processing for my app. I'm planning to have this offsite computer download the logs about every 30 minutes, and calculate what the view count should be for each question based off of the logs. By doing this, I don't have to create a datastore entity for each different question each user views.

What do you guys think? Does anyone see any problems with this?

EDIT:I guess my main concern is the reliability of the logs.

like image 867
Kyle Avatar asked Mar 10 '10 18:03

Kyle


1 Answers

This isn't an answer to your question - rather, it's a response to the problem you are trying to solve.

If you're familiar with Bloom Filters and using Memcached's incr (or a sharded datastore counter) you can create a solution that is "good enough". You can use a Bloom Filter to test whether a value is in the set (in this case, a User id), and if not, increment your counter and add the value to the filter. One of the properties of Bloom Filters is that adding a value to the set to be inclusion checked against is a constant time operation. Spacewise, it'll take a bit of space to store each potential filter, but this already seems to be an order of magnitude less complex than writing code to periodically grep for uniques. Here's a Python implementation.

Nothing is free, however - I said "good enough" was important. With Bloom Filters, there is always a chance of a false positive. That is, depending on the size of the hash per question, there is a small chance you will check to see if user ID has already been counted and get a "YES IT HAS" when that is the first time the User has viewed the question. You can calculate the size you need for a reasonable false positive, but there is a space tradeoff for doing so.

like image 50
Ikai Lan Avatar answered Nov 20 '22 14:11

Ikai Lan