Training

Introduction to Data Engineering on Google Cloud

Mode:

Online, In person

At Matza Education, each training course is designed to offer practical and relevant knowledge, connecting theory and application in real-life scenarios. Our aim is to prepare professionals for the challenges of the market, strengthening technical and strategic skills in different areas of technology and management.

By taking part in one of our programs, you will have access to up-to-date content, experienced instructors and a results-oriented methodology. Regardless of the format - face-to-face or online - we aim to create a dynamic, accessible and high-impact learning experience.

More than just a course, each training is an opportunity for professional and personal development, helping you to gain certifications, expand your skills and stand out in an increasingly competitive market.

Important: you must confirm the e-mail you received after registering to validate your participation.

  • Data engineers
  • Database administrators
  • System administrators

In this course, you will learn about data engineering in Google Cloud, the roles and responsibilities of data engineers and how they relate to the services provided by Google Cloud. You will also learn about ways to deal with data engineering challenges.

In this course, participants will learn the following skills:

  • Understand the role of a data engineer.
  • Understand the data engineering tasks and main components used in Google Cloud.
  • Understand how to create and deploy data pipelines of various standards in Google Cloud.
  • Identify and use various automation techniques in Google Cloud.

To get the most out of this course, participants need to meet the following criteria:

  • Previous experience with Google Cloud at a fundamental level using Cloud Shell and accessing products in the Google Cloud console.
  • Basic proficiency in a common query language such as SQL.
  • Experience with data modeling and ETL activities (extract, transform, load).
  • Experience in developing applications using a common programming language, such as Python.

Experience in developing applications using a common programming language, such as Python.

1 day - 08 class hours - Live Online

Module 01: Data Engineering Tasks and Components

  • The role of a data engineer
  • Data sources versus data destinations
  • Data formats
  • Google Cloud storage solution options
  • Metadata management options in Google Cloud
  • Sharing data sets using Analytics Hub

Objectives:

  • Explain the role of a data engineer.
  • Understand the differences between a data source and a data destination.
  • Explain the different types of data formats.
  • Explain the storage solution options in Google Cloud.
  • Learn about the metadata management options in Google Cloud.
  • Understand how to share data sets easily using Analytics Hub.
  • Understand how to load data into BigQuery using the Google Cloud console or the gcloud CLI.

Module 02: Data Replication and Migration

  • Replication and migration architecture
  • The gcloud command line tool
  • Moving data sets
  • Datastream

Objectives:

  • Explain the basic architecture of Google Cloud data replication and migration.
  • Understand the options and use cases of the gcloud command line tool.
  • Explain the functionality and use cases of the Storage Transfer Service.
  • Explain the functionality and use cases of the Transfer Appliance.
  • Understand the features and implementation of Datastream.

Module 03: The Extract and Load Data Pipeline Standard

  • Extraction and loading architecture
  • The bq command line tool
  • BigQuery Data Transfer Service
  • BigLake

Objectives:

  • Explain the basic extraction and loading architecture diagram.
  • Understand the options of the bq command line tool.
  • Explain the functionality and use cases of the BigQuery Data Transfer Service.
  • Explain the functionality and use cases of BigLake as a standard without extraction and loading.

Module 04: The Extract, Load and Transform Data Pipeline Standard

  • Extract, load and transform (ELT) architecture
  • SQL scripting and scheduling with BigQuery
  • Dataform

Objectives:

  • Explain the basic architecture diagram of extraction, loading and transformation.
  • Understand a common ELT pipeline in Google Cloud.
  • Learn about BigQuery's SQL scripting and scheduling features.
  • Explain the functionality and use cases of Dataform.

Module 05: The Extract, Transform and Load Data Pipeline Standard

  • Extract, transform and load (ETL) architecture
  • Google Cloud GUI tools for ETL data pipelines
  • Batch data processing using Dataproc
  • Streaming data processing options
  • Bigtable and data pipelines

Objectives:

  • Explain the basic architecture diagram of extraction, transformation and loading.
  • Learn about the GUI tools in Google Cloud used for ETL data pipelines.
  • Explain batch data processing using Dataproc.
  • Learn how to use Dataproc Serverless for Spark for ETL.
  • Explain the options for processing streaming data.
  • Explain the role that Bigtable plays in data pipelines.

Module 06: Automation Techniques

  • Standards and automation options for pipelines
  • Cloud Scheduler and workflows
  • Cloud Composer
  • Cloud Run Functions
  • Eventarc

Objectives:

  • Explain the standards and automation options available for pipelines.
  • Learn about Cloud Scheduler and workflows.
  • Learn about Cloud Composer.
  • Learn about the functions of Cloud Run.
  • Explain the functionality and use cases of automation for Eventarc.