Training

Data Driven Transformation with Google Cloud

Mode:

Online, In person

At Matza Education, each training course is designed to offer practical and relevant knowledge, connecting theory and application in real-life scenarios. Our aim is to prepare professionals for the challenges of the market, strengthening technical and strategic skills in different areas of technology and management.

By taking part in one of our programs, you will have access to up-to-date content, experienced instructors and a results-oriented methodology. Regardless of the format - face-to-face or online - we aim to create a dynamic, accessible and high-impact learning experience.

More than just a course, each training is an opportunity for professional and personal development, helping you to gain certifications, expand your skills and stand out in an increasingly competitive market.

Important: you must confirm the e-mail you received after registering to validate your participation.

  • Extract, load, transform, clean and validate data
  • Designing channels and architectures for data processing
  • Create and maintain machine learning models and statistical models
  • Query data sets, view query results and create reports
  • Designing and creating data processing systems on the Google Cloud Platform
  • Batch and streaming data processing, implementing automatic scaling data channels in Cloud Dataflow
  • Derive business insights from extremely large data sets using Google BigQuery
  • Train, evaluate and predict with machine learning models using TensorFlow and Cloud ML
  • Leverage unstructured data with Spark APIs and machine learning in the Dataproc Cloud
  • Provide instant insights from streaming data
  • Completed Google Cloud Fundamentals: Big Data & Machine Learning course OR equivalent experience
  • Basic proficiency in common query languages such as SQL
  • Experience with data modeling activities, extraction, transformation and loading
  • Application development with a common programming language such as Python
  • Familiarity with Machine Learning and/or statistics

4 days - 32 class hours - Live or Online

  • Module 1: Overview of Google Cloud Dataproc
    • Cluster creation and management
    • Use of customized machine types and preemptive work nodes
    • Scaling and excluding clusters
    • Lab: How to create Hadoop clusters with Google Cloud Dataproc
  • Module 2: Running Dataproc jobs
    • Execution of Pig and Hive jobs
    • Separation of storage and computing
    • Lab: How to run Hadoop and Spark jobs with Dataproc
    • Lab: Sending and monitoring jobs
  • Module 3: Integrating Dataproc with Google Cloud Platform
    • Customizing clusters with startup actions
    • BigQuery support
    • Lab: How to take advantage of Google Cloud Platform services
  • Module 4: Solution for unstructured data with Google's Machine Learning APIs
    • Google Machine Learning APIs
    • Common ML use cases
    • Invoking ML APIs
    • Lab: How to add Machine Learning features to Big Data analysis
  • Module 5: Serverless data analysis with BigQuery
    • What is BigQuery
    • Queries and functions
    • Lab: How to write queries in BigQuery
    • Loading data into BigQuery
    • Exporting data from BigQuery
    • Lab: How to load and export data
    • Nested and repeated fields
    • Querying several tables
    • Laboratory: Complex consultations
    • Performance and prices
  • Module 6: Auto-scaling and serverless data channels with Dataflow
    • The Beam programming model
    • Data channels in Beam Python
    • Data channels in Beam Java
    • Lab: How to write a Dataflow channel
    • Scalable Big Data processing with Beam
    • Lab: MapReduce in Dataflow
    • Incorporating additional data
    • Lab: Secondary inputs
    • Streaming data processing
    • GCP reference architecture
  • Module 7: Getting started with Machine Learning
    • What is machine learning (ML)
    • Effective ML: concepts, types
    • ML data sets: generalization
    • Lab: Explore and create ML datasets
  • Module 8: Creating ML models with TensorFlow
    • Getting started with TensorFlow
    • Lab: How to use tf.learn
    • TensorFlow graphs and loops + lab
    • Lab: How to use low-level TensorFlow + early stop
    • Monitoring ML training
    • Lab: TensorFlow training tables and graphs
  • Module 9: Scaling ML models with CloudML
    • Why use Cloud ML?
    • Packaging a TensorFlow model
    • Complete training
    • Lab: Running an ML model locally and in the cloud
  • Module 10: Attribute engineering
    • Creating ideal attributes
    • Input transformation
    • Synthetic attributes
    • Pre-processing with Cloud ML
    • Laboratory: Attribute Engineering
  • Module 11: Architecture of streaming analysis channels
    • Streaming data processing: challenges
    • Processing variable data volumes
    • Processing unsorted/delayed data
    • Lab: How to create streaming channels
  • Module 12: Intake of variable volumes
    • What is Cloud Pub/Sub?
    • How it works: topics and subscriptions
    • Laboratory: Simulator
  • Module 13: Implementing streaming channels
    • Streaming processing challenges
    • Delayed data processing: watermarks, triggers, accumulation
    • Laboratory: Streaming data processing channel for real-time traffic data
  • Module 14: Dashboards and streaming analysis
    • Streaming analytics: from data to decisions
    • Querying streaming data with BigQuery
    • What is Google Data Studio?
    • Lab: Create a real-time dashboard to visualize processed data
  • Module 15: High capacity and low latency with Bigtable
    • What is Cloud Spanner?
    • Creating Bigtable schemas
    • How to process in Bigtable
    • Lab: how to stream on Bigtable