Tutorial of Using Google Client Libraries for BigQuery Data Streaming in PHP

Google BigQuery is good alternative as a big data warehouse. According to Google, it’s suitable for analytics / transaction data store. If you are still unsure about which platform to use for your data warehousing, here are some comparisons:

Once you decided that BigQuery is your choice, in most cases, you will use its API to automate your data store process. This tutorial intends to give you a quick start of using BigQuery API in PHP. Following the steps you would be able to create tables and insert data into BigQuery.

Step 1. Create a project

Create a new project in the console and take notes of your project ID which will be referenced when using the API.

create-project

 

Step 2. Authentication

2.1 Create a service account. You can refer to my previous post and go through step 1-2.2 Download Google Client Libraries. Follow the instructions here.
2.3 Authenticate your application using following code.

require_once 'vendor/autoload.php'; //change it to the path to your vendor/autoload.php
$path_to_key = '/secret.json'; //change it to the path to your key file

//set up BigQuery service object
$client = new Google_Client();
putenv('GOOGLE_APPLICATION_CREDENTIALS=' . $path_to_key);
$client->useApplicationDefaultCredentials();
$client->addScope(Google_Service_Bigquery::BIGQUERY);
$bigquery = new Google_Service_Bigquery($client);

 

Step 3. Create a dataset

//define your project ID and dataset name
$project_id = 'YOUR_PROJECT_ID';
$dataset_id = 'YOUR_DATASET_NAME';
//create a dataset reference object
$dataset_ref = new Google_Service_Bigquery_DatasetReference();
$dataset_ref->setProjectId($project_id);
$dataset_ref->setDatasetId($dataset_id);

//create a dataset object
$dataset = new Google_Service_Bigquery_Dataset();
$dataset->setDatasetReference($dataset_ref);

//insert the dataset object into the project
$bigquery->datasets->insert($project_id, $dataset);

 

Step 4. Create a table

4.1 Define a schema for the table

//change the name and type to your column name and data type
$schema = array(
                array('name' => 'id',
                      'type' => 'integer',
                      'mode' => 'required'),
                array('name' => 'name',
                      'type' => 'string',
                      'mode' => 'required')
               );

//create the table schema object
$table_schema = new Google_Service_Bigquery_TableSchema();
$table_schema->setFields($schema);

4.2 Create the table

//define your table name
$table_id = 'YOUR_TABLE_NAME';

//create a table reference object
$table_reference = new Google_Service_Bigquery_TableReference();
$table_reference->setProjectId($project_id);
$table_reference->setDatasetId($dataset_id);
$table_reference->setTableId($table_id);

// create a new table object
$table = new Google_Service_Bigquery_Table();
$table->setTableReference($table_reference);
$table->setSchema($table_schema);

// insert the table into the dataset
$bigquery->tables->insert($project_id, $dataset_id, $table);

 

Step 5. Insert data into the table

Here I’m using streaming method. Another way to import data is to upload directly with a JSON or CSV file, but streaming would be a better way with large datasets and you can query part of the data before the whole dataset is loaded.

//set the data following the schema we created in step 4
$data = array(
              array('id' => 1,
                    'name' => 'a name'),
              array('id' => 2,
                    'name' => 'another name'));

//convert the data array into bigquery row objects
$rows = array();
foreach ($data as $k => $v) {
    $row = new Google_Service_Bigquery_TableDataInsertAllRequestRows();
    $row->setJson(json_decode(json_encode($v)));
    $rows[] = $row;
}

//set up the request
$request = new Google_Service_Bigquery_TableDataInsertAllRequest();
$request->setKind('bigquery#tableDataInsertAllRequest');
$request->setRows($rows);

//call the API
$bigquery->tabledata->insertAll($project_id, $dataset_id, $table_id, $request);

 

Now, you can take a look at the web console and check if the dataset and table you created actually exist and if the data are streamed into the table. Sometimes you cannot find data in the “Preview” table even if your streaming job succeeded. That’s because it takes some time for BigQuery to actually store the data. But you can query the data right after the streaming. So, just query the table using a SQL ‘SELECT * FROM [Table Name]’ and check if the data is there.

Hope this tutorial gets you familiar with the BigQuery API in PHP. Thanks for reading!

Reference:

  1. https://cloud.google.com/bigquery/streaming-data-into-bigquery
  2. https://github.com/google/google-api-php-client
  3. https://github.com/rawswift/bigquery-php-cli

2 thoughts on “Tutorial of Using Google Client Libraries for BigQuery Data Streaming in PHP

  1. Evan June 12, 2017 / 7:58 am

    Excellent tutorial, thank you. I would like to know where you do run this code on gcloud? app engine or in the terminal? Thanks

    Like

    • Xiaoxin Lufei July 15, 2017 / 1:00 am

      We ran it daily on the server, which is on AWS, with a cron event set up. I think it will work on gcloud too.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s