Let’s automate our daily ETL job!

“Why we should make a script for automating our daily ETL ?”

As we know today that a Data Engineer should give a right dimension data into business user. A product from Data Engineer could be a table that accessible for doing the next processing step. The challenges part as a Data engineer is providing a right data to the business user based on their needs.

The next questions are, How do we create an ETL for large-scale data? or maybe how can we manage an ETL performance every day?

I have been asking to several person which as a user from Data Engineer such as…


Let’s Code!

In this case, i tried to developing an endpoint of an API for NLP Model. The goals of this article is implemented an endpoint called “feedback”. How the “/feedack” endpoint should work?

“feedback” endpoint solution

An “/feedback” endpoint should receives an JSON input something like this:

{
"text": "im glad ur doing well",
"sentiment": "positive"
}

So, what task should be done from “/feedback” endpoint?

  • It checks if the text exists in either positive.txt or negative.txt (just do simple string matching). If it exists, we ignore the request, i.e. do nothing and return something like:
{
"data": {…


Playing commands through Terminal!

Exploring Google Cloud Storage

After we make a configuration to particular project in your google cloud platform, we are going to explore first to the google cloud storage. We can’t load our data directly to the BigQuery before we load the file first into our cloud storage. So, we have to create a Buckets first, then load the file into those buckets by following this commands below.

Buckets

The file system on your local computer stores data in files and organizes files using directories. Cloud Storage stores data in objects and collects objects inside Buckets. …


Playing commands through Terminal!

As a data engineer, i think we should learn more about using command line either in windows Powershell, MacOS or even Linux. This article will explain about how we dealing with google cloud feature using command line. Hopefully, through this way, we will familiar with a command line. As we know that today a data engineer should also familiar using cloud service provider. This case we will exploring a google cloud platform through command line such as create a bucket then load it into BigQuery and also trying to partitioned our dataset. Some of us will…


The essential of Data Modelling!

Data vs Model

If we talk about data modeling, then we are talking in 2 words namely model and data. Data is a collection of facts, while a model is a form of representation. Then, what is data modeling? Data modelling is a representation of the data structures that required by a database. I would say that an effective and efficient database will depends on the data modelling process.

Imagine that you will build your house, then what will you do? Will you just buy the materials and build a house? I don’t think so. Yups, you should…


Let’s automate it!

https://www.customtruck.com/blog/how-are-oil-gas-and-refined-products-transported/

“Why we need pipelines in processing data?”

For those still did not know why we should need pipelines, or maybe still confuse about data pipeline. After i read several articles i may say that data pipelines is a ‘set of action’ that extract from various sources. It might be transform and load the data into particular place. so, is it an automated processing? Yups! it seems like take the columns from database, merge them, transform them as we want and load it back to database system.

See this figure below for better understanding!


Everything’s start with a basic concept!

In using any programming language, we must learn from the most basic things. Then, what are the basic things that are important to learn in python?

  • A data type in python and also a treatment for them
  • Conditional Statements
  • While and For loop in python

Let’s start with a data type! Python has several very interesting data types to know, including integers, strings, booleans, lists, dictionaries, tuples and sets. For each data type, maybe you already know some details of their meaning and use, so in this article I only provide examples of the…


From zero to hero in SQL Part 2!

After learning the basics of queries in the previous article, in this article, we will learn a little more about queries. as I said in the previous article that it is possible for us to create reports to aggregate data, sort data and join data between table’s.

GROUP BY : used to group certain columns so that aggregation calculations can be performed

ORDER BY : used to sort data from smallest to largest and from A-Z or vice versa.

JOIN : used to joining combine 2 or more tables

SubQuery: a Query…


From zero to hero in SQL Part 1!

As i said before in previous article that SQL is a standard language to access the database system. Sometimes we want to do data aggregation to find out a summary of insightful information from the data. In fact, we often want to just know the distribution of data in that data or rather we just want to display some of the most recent data.

Before we are going deeper into how can access data on a database system, we will create a local database system as a sandbox. …


Concept Database, Basic and Intermediate Queries

What is database? Database is a storage of large amounts of data and can be accessed by all users at the same time, usually a standardized and consistent data format, database also fast and easy to manipulate data (add new data, delete, edit and others).

As explained in the picture above that all data can be accessed by authorized parties in the company. The picture above is only a brief explanation. However, many types of database applications are used to store company data.

Mulianaraul

I love Data Science!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store