Configuration Google Cloud for dealing with BigQuery Part (I)
Playing commands through Terminal!
As a data engineer, i think we should learn more about using command line either in windows Powershell, MacOS or even Linux. This article will explain about how we dealing with google cloud feature using command line. Hopefully, through this way, we will familiar with a command line. As we know that today a data engineer should also familiar using cloud service provider. This case we will exploring a google cloud platform through command line such as create a bucket then load it into BigQuery and also trying to partitioned our dataset. Some of us will ask “Why we should make a partition into our dataset? is it a must? I will tell you later about an advantage of making a partition.
Before we are going deeper into google cloud, Let’s take a look into our roadmap of this article below. So, you can imagine what we will do further.
As i mention before, maybe it will be easier to use the UI from the google cloud platform (GCP), but the purpose of this article is to help you become more familiar interacting with the google cloud platform using the command line. First, we are going to do configuration from local to GCP. Keep in mind that the configuration using the windows operating system is slightly different from MacOS / Linux.
Configuration GCP to Local
Login or Create a Google Account on cloud.google.com! Use your gmail account and Download GCloud SDK Package Here
This case, i use the 330.0.0 version. Try this command.
Go to Downloads directory, and move tar
into home directory. Then, go to home directory and check the tar
file.
cd ~/Downloads
mv google-cloud-sdk-330.0.0-darwin-x86_64.tar.gz ~/
cd ~/
ls
Unpacking the tar
file and after you finish unpacking the tar
file, check the folder and then remove the tar
file.
tar xopf google-cloud-sdk-330.0.0-darwin-x86_64.tar.gz
ls
rm google-cloud-sdk-330.0.0-darwin-x86_64.tar.gz
and, Install gcloud
on your PATH
cd google-cloud-sdk
./install.sh
After you run the above command, you will see
Modify profile to update your $PATH and enable shell command
completion?Do you want to continue (Y/n)? #Type Y and hit enter/return
Then, you will see another message in terminal
The Google Cloud SDK installer will now prompt you to update an rc
file to bring the Google Cloud CLIs into your environment.Enter a path to an rc file to update, or leave blank to use
[/Users/cfe/.zshrc]# Hit enter/return to accept the default (this is what I recommend)
Windows
Open your Powershell, the command seems pretty similar with MacOS/Linux. Check this below. This commands below means first you go to the Downloads directory, second move the zip
file to the Home directory. Then, go to Home directory
cd ~/Downloads
mv google-cloud-sdk-290.0.0-windows-x86_64.zip ~/
cd ~/
The difference between MacOS/Linux and Windows is the unpacking commands, here is
Expand-Archive google-cloud-sdk-290.0.0-windows-x86_64.zip
rm google-cloud-sdk-290.0.0-windows-x86_64.zip
We are using Expand-Archive
to unpack the download. This might take a while. If its done, remove the zip
file. Then you can install through the google-cloud-sdk directory. follow the commands below.
cd google-cloud-sdk
.\install.bat
You will see this message
Welcome to the Google Cloud SDK!To help improve the quality of this product, we collect anonymized usage data
and anonymized stacktraces when crashes are encountered; additional information
is available at <https://cloud.google.com/sdk/usage-statistics>. This data is
handled in accordance with our privacy policy
<https://policies.google.com/privacy>. You may choose to opt in this
collection now (by choosing 'Y' at the below prompt), or at any time in the
future by running the following command:gcloud config set disable_usage_reporting falseDo you want to help improve the Google Cloud SDK (y/N)?# Type y and hit enter/return
# after that you also this message Update %PATH% to include Cloud SDK binaries? (Y/n)? .
Please enter 'y' or 'n':# Type y and hit Enter to continue. Actually, you want this since it makes running glcoud as easy as typing gcloud anywhere in powershell.
After you finish the installation process, don’t forget to update the gcloud by
gcloud components update# Windows users: you might have to right click on Powershell and Run as Administrator to use this command
How we create a google cloud platform account?
- Go to cloud.google.com
- Login using your current google account
You can also login through the terminal
gcloud auth login
That commands will open a default web browser and login to Google and accept that Google Cloud SDK wants to access your Google Account
. Let’s make our first google cloud project.
gcloud projects create <Your project_id>
In this case, i use bigdata-etl-3 as my project_id, after running the commands above, you will see
(base) macbookpro@MacBooks-MacBook-Pro ~ % gcloud projects create bigdata-etl-3Create in progress for [https://cloudresourcemanager.googleapis.com/v1/projects/bigdata-etl-3].Waiting for [operations/cp.5517587373076550245] to finish...done.Enabling service [cloudapis.googleapis.com] on project [bigdata-etl-3]...Operation "operations/acf.p2-548231402698-2bfe6df1-a16c-48dc-9b8f-5ed35a16f504" finished successfully.
The message above tell us that we have successfully created a project with a name bigdata-etl-3.
The last step of the configuration step is make a configuration to project
gcloud config set project bigdata-etl-3# after running the commands you will see this messageUpdated property [core/project].
Let’s check the configuration by
gcloud info# you will get this messageGoogle Cloud SDK [329.0.0]Platform: [Mac OS X, x86_64] uname_result(system='Darwin', node='MacBooks-MacBook-Pro.local', release='20.3.0', version='Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64', machine='x86_64', processor='i386')Locale: (None, 'UTF-8')Python Version: [3.8.3 (default, Jul 2 2020, 11:26:31) [Clang 10.0.0 ]]Python Location: [/opt/anaconda3/bin/python3]Site Packages: [Disabled]Installation Root: [/Users/macbookpro/google-cloud-sdk]Installed Components:gsutil: [4.59]core: [2021.02.19]bq: [2.0.65]System PATH: [/Users/macbookpro/google-cloud-sdk/bin:/opt/anaconda3/bin:/opt/anaconda3/condabin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/share/dotnet:~/.dotnet/tools:/Library/Frameworks/Mono.framework/Versions/Current/Commands]Python PATH: [/Users/macbookpro/google-cloud-sdk/lib/third_party:/Users/macbookpro/google-cloud-sdk/lib:/opt/anaconda3/lib/python38.zip:/opt/anaconda3/lib/python3.8:/opt/anaconda3/lib/python3.8/lib-dynload]Cloud SDK on PATH: [True]Kubectl on PATH: [False]Installation Properties: [/Users/macbookpro/google-cloud-sdk/properties]User Config Directory: [/Users/macbookpro/.config/gcloud]Active Configuration Name: [default]Active Configuration Path: [/Users/macbookpro/.config/gcloud/configurations/config_default]Account: [your_email]Project: [bigdata-etl-3]Current Properties:[core]account: [your_email]disable_usage_reporting: [False]project: [bigdata-etl-3]Logs Directory: [/Users/macbookpro/.config/gcloud/logs]Last Log File: [/Users/macbookpro/.config/gcloud/logs/2021.03.04/21.47.07.003684.log]git: [git version 2.30.0]ssh: [OpenSSH_8.1p1, LibreSSL 2.7.3]
from those message we can see that we already create configuration to bigdata-etl-3 project by using our registered email account.
if you want to change your configuration to another project just do the same commands, let me give you an example. Imagine that we want to make configuration into bigdata-etl-2 project, so you just run this commands
gcloud config set project bigdata-etl-2#then, to check whether you are in a big data-etl-2 project just rungcloud info
The part II section we will try to exploring the google storage by creating a buckets and load the local file into the bucket for further action!