How to load compressed files into BigQuery

Question

I want to upload the wikipedia pageviews from http://dumps.wikimedia.org/other/pagecounts-raw/ into BigQuery, what's the fastest way?

Felipe Hoffa · Accepted Answer

This is a classic demo I do to show how easy is to load data into BigQuery.

To get an hour of pageviews from Wikipedia, just wget the file:

wget http://dumps.wikimedia.org/other/pagecounts-raw/2014/2014-06/pagecounts-20140602-180000.gz

Then load it into BigQuery (should take less or around 5 minutes):

bq load -F" " --quote "" fh-bigquery:wikipedia.pagecounts_20140602_18 pagecounts-20140602-180000.gz language,title,requests:integer,content_size:integer

Note that this file weights around 100MB compressed (gz), and you don't need to decompress files this size to load them into BigQuery. It contains around 8 million rows of hourly pageviews counts.

-F" ": This file is separated by spaces, not comma.
--quote "": This file doesn't use quotes for strings.
fh-bigquery:wikipedia.pagecounts_20140602_18: The name of the BigQuery dataset where the file will be loaded.
pagecounts-20140602-180000.gz: The name of the gz file downloaded from wikimedia.
language,title,requests:integer,content_size:integer: The columns. By default everything is loaded as a string, except when we explicitly cast it to a different type (integers in this case).

(Requires previous installation of the bq tool)

I made this dataset public, you can find this table at https://bigquery.cloud.google.com/table/fh-bigquery:wikipedia.pagecounts_20140602_18.

I also left all the August pageviews at https://bigquery.cloud.google.com/table/fh-bigquery:wikipedia.wikipedia_views_201308 (53 billion pageviews if you SELECT SUM(requests) FROM [fh-bigquery:wikipedia.wikipedia_views_201308].

How to load compressed files into BigQuery

Tags:

google-bigquery

Felipe Hoffa

1 Answers

Felipe Hoffa

Recent Activity

Donate For Us

How to load compressed files into BigQuery

Tags:

google-bigquery

Felipe Hoffa

1 Answers

Felipe Hoffa

Related questions

Recent Activity

Donate For Us