Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using BigQuery for logs analysis

Im trying to do logs analysis with BigQuery. Specifically, I have an appengine app and a javascript client that will be sending log data to BigQuery. In bigquery, I'll store the full log text in one column but also extract important fields into other columns. I then want to be able to do adhoc queries over those columns.

Two questions:

1) Is BigQuery particularly good or particularly bad at this use case? 2) How do I setup revolving logs? I.e. I want to only store the last N logs or the last X GB of log data. I see delete is not supported.

like image 313
aloo Avatar asked May 01 '12 23:05

aloo


People also ask

Can BigQuery be used for ETL?

BigQuery is a serverless, scalable cloud-based data warehouse provided by Google Cloud Platform. It is a fully managed warehouse that allows users to perform ETL on the data with the help of SQL queries.

What is BigQuery not good for?

However, despite its unique advantages and powerful features, BigQuery is not a silver bullet. It is not recommended to use it on data that changes too often and, due to its storage location bound to Google's own services and processing limitations it's best not to use it as a primary data storage.

What is BigQuery best for?

BigQuery is a fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence.


3 Answers

  1. Just so you know, there is an excellent demo of moving App Engine Log data to BigQuery via App Engine MapReduce called log2bq (http://code.google.com/p/log2bq/)

  2. Re: "use case" - Stack Overflow is not a good place for judgements about best or worst, but BigQuery is used internally at Google to analyse really really big log data.

  3. I don't see the advantage of storing full log text in a single column. If you decide that you must set up revolving "logs," you could ingest daily log dumps by creating separate BigQuery tables, perhaps one per day, and then delete the tables when they become old. See https://developers.google.com/bigquery/docs/reference/v2/tables/delete for more information on the Table.delete method.

like image 186
Michael Manoochehri Avatar answered Nov 15 '22 08:11

Michael Manoochehri


After implementing this - we decided to open source the framework we built for it. You can see the details of the framework here: http://blog.streak.com/2012/07/export-your-google-app-engine-logs-to.html

like image 39
aloo Avatar answered Nov 15 '22 09:11

aloo


If you want your Google App Engine (Google Cloud) project's logs to be in BigQuery, Google has added this functionality built in to the new Cloud Logging system. It is a beta feature known as "Logs Export" https://cloud.google.com/logging/docs/install/logs_export

They summarize it as:

Export your Google Compute Engine logs and your Google App Engine logs to a Google Cloud Storage bucket, a Google BigQuery dataset, a Google Cloud Pub/Sub topic, or any combination of the three.

We use the "Stream App Engine Logs to BigQuery" feature in our Python GAE projects. This sends our app's logs directly to BigQuery as they are occurring to provide near real-time log records in a BigQuery dataset.

There is also a page describing how to use the exported logs. https://cloud.google.com/logging/docs/export/using_exported_logs

When we want to query logs exported to BigQuery over multiple days (e.g. the last week), you can use a SQL query with a FROM clause like this:

FROM
    (TABLE_DATE_RANGE(my_bq_dataset.myapplog_,
      DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY'), CURRENT_TIMESTAMP()))
like image 35
Jesse Webb Avatar answered Nov 15 '22 08:11

Jesse Webb