Configuration Google Cloud for dealing with BigQuery Part (I)

Mulianaraul
5 min readMar 4, 2021

--

Playing commands through Terminal!

As a data engineer, i think we should learn more about using command line either in windows Powershell, MacOS or even Linux. This article will explain about how we dealing with google cloud feature using command line. Hopefully, through this way, we will familiar with a command line. As we know that today a data engineer should also familiar using cloud service provider. This case we will exploring a google cloud platform through command line such as create a bucket then load it into BigQuery and also trying to partitioned our dataset. Some of us will ask “Why we should make a partition into our dataset? is it a must? I will tell you later about an advantage of making a partition.

Before we are going deeper into google cloud, Let’s take a look into our roadmap of this article below. So, you can imagine what we will do further.

As i mention before, maybe it will be easier to use the UI from the google cloud platform (GCP), but the purpose of this article is to help you become more familiar interacting with the google cloud platform using the command line. First, we are going to do configuration from local to GCP. Keep in mind that the configuration using the windows operating system is slightly different from MacOS / Linux.

Configuration GCP to Local

Login or Create a Google Account on cloud.google.com! Use your gmail account and Download GCloud SDK Package Here

This case, i use the 330.0.0 version. Try this command.

Go to Downloads directory, and move tar into home directory. Then, go to home directory and check the tar file.

cd ~/Downloads
mv google-cloud-sdk-330.0.0-darwin-x86_64.tar.gz ~/
cd ~/
ls

Unpacking the tar file and after you finish unpacking the tar file, check the folder and then remove the tar file.

tar xopf google-cloud-sdk-330.0.0-darwin-x86_64.tar.gz
ls
rm google-cloud-sdk-330.0.0-darwin-x86_64.tar.gz

and, Install gcloud on your PATH

cd google-cloud-sdk
./install.sh

After you run the above command, you will see

Modify profile to update your $PATH and enable shell command 
completion?
Do you want to continue (Y/n)? #Type Y and hit enter/return

Then, you will see another message in terminal

The Google Cloud SDK installer will now prompt you to update an rc 
file to bring the Google Cloud CLIs into your environment.
Enter a path to an rc file to update, or leave blank to use
[/Users/cfe/.zshrc]
# Hit enter/return to accept the default (this is what I recommend)

Windows

Open your Powershell, the command seems pretty similar with MacOS/Linux. Check this below. This commands below means first you go to the Downloads directory, second move the zip file to the Home directory. Then, go to Home directory

cd ~/Downloads
mv google-cloud-sdk-290.0.0-windows-x86_64.zip ~/
cd ~/

The difference between MacOS/Linux and Windows is the unpacking commands, here is

Expand-Archive google-cloud-sdk-290.0.0-windows-x86_64.zip
rm google-cloud-sdk-290.0.0-windows-x86_64.zip

We are using Expand-Archive to unpack the download. This might take a while. If its done, remove the zip file. Then you can install through the google-cloud-sdk directory. follow the commands below.

cd google-cloud-sdk
.\install.bat

You will see this message

Welcome to the Google Cloud SDK!To help improve the quality of this product, we collect anonymized usage data
and anonymized stacktraces when crashes are encountered; additional information
is available at <https://cloud.google.com/sdk/usage-statistics>. This data is
handled in accordance with our privacy policy
<https://policies.google.com/privacy>. You may choose to opt in this
collection now (by choosing 'Y' at the below prompt), or at any time in the
future by running the following command:
gcloud config set disable_usage_reporting falseDo you want to help improve the Google Cloud SDK (y/N)?# Type y and hit enter/return
# after that you also this message
Update %PATH% to include Cloud SDK binaries? (Y/n)? .
Please enter 'y' or 'n':
# Type y and hit Enter to continue. Actually, you want this since it makes running glcoud as easy as typing gcloud anywhere in powershell.

After you finish the installation process, don’t forget to update the gcloud by

gcloud components update# Windows users: you might have to right click on Powershell and Run as Administrator to use this command

How we create a google cloud platform account?

  1. Go to cloud.google.com
  2. Login using your current google account

You can also login through the terminal

gcloud auth login

That commands will open a default web browser and login to Google and accept that Google Cloud SDK wants to access your Google Account. Let’s make our first google cloud project.

gcloud projects create <Your project_id>

In this case, i use bigdata-etl-3 as my project_id, after running the commands above, you will see

(base) macbookpro@MacBooks-MacBook-Pro ~ % gcloud projects create bigdata-etl-3Create in progress for [https://cloudresourcemanager.googleapis.com/v1/projects/bigdata-etl-3].Waiting for [operations/cp.5517587373076550245] to finish...done.Enabling service [cloudapis.googleapis.com] on project [bigdata-etl-3]...Operation "operations/acf.p2-548231402698-2bfe6df1-a16c-48dc-9b8f-5ed35a16f504" finished successfully.

The message above tell us that we have successfully created a project with a name bigdata-etl-3.

project bigdata-etl-3 was successfully created

The last step of the configuration step is make a configuration to project

gcloud config set project bigdata-etl-3# after running the commands you will see this messageUpdated property [core/project].

Let’s check the configuration by

gcloud info# you will get this messageGoogle Cloud SDK [329.0.0]Platform: [Mac OS X, x86_64] uname_result(system='Darwin', node='MacBooks-MacBook-Pro.local', release='20.3.0', version='Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64', machine='x86_64', processor='i386')Locale: (None, 'UTF-8')Python Version: [3.8.3 (default, Jul  2 2020, 11:26:31)  [Clang 10.0.0 ]]Python Location: [/opt/anaconda3/bin/python3]Site Packages: [Disabled]Installation Root: [/Users/macbookpro/google-cloud-sdk]Installed Components:gsutil: [4.59]core: [2021.02.19]bq: [2.0.65]System PATH: [/Users/macbookpro/google-cloud-sdk/bin:/opt/anaconda3/bin:/opt/anaconda3/condabin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/share/dotnet:~/.dotnet/tools:/Library/Frameworks/Mono.framework/Versions/Current/Commands]Python PATH: [/Users/macbookpro/google-cloud-sdk/lib/third_party:/Users/macbookpro/google-cloud-sdk/lib:/opt/anaconda3/lib/python38.zip:/opt/anaconda3/lib/python3.8:/opt/anaconda3/lib/python3.8/lib-dynload]Cloud SDK on PATH: [True]Kubectl on PATH: [False]Installation Properties: [/Users/macbookpro/google-cloud-sdk/properties]User Config Directory: [/Users/macbookpro/.config/gcloud]Active Configuration Name: [default]Active Configuration Path: [/Users/macbookpro/.config/gcloud/configurations/config_default]Account: [your_email]Project: [bigdata-etl-3]Current Properties:[core]account: [your_email]disable_usage_reporting: [False]project: [bigdata-etl-3]Logs Directory: [/Users/macbookpro/.config/gcloud/logs]Last Log File: [/Users/macbookpro/.config/gcloud/logs/2021.03.04/21.47.07.003684.log]git: [git version 2.30.0]ssh: [OpenSSH_8.1p1, LibreSSL 2.7.3]

from those message we can see that we already create configuration to bigdata-etl-3 project by using our registered email account.

if you want to change your configuration to another project just do the same commands, let me give you an example. Imagine that we want to make configuration into bigdata-etl-2 project, so you just run this commands

gcloud config set project bigdata-etl-2#then, to check whether you are in a big data-etl-2 project just rungcloud info

The part II section we will try to exploring the google storage by creating a buckets and load the local file into the bucket for further action!

--

--

Mulianaraul
Mulianaraul

No responses yet