AI Certification Exam Prep — Beginner
A domain-mapped plan to pass GCP-PDE with confident, real-world decisions.
This beginner-friendly exam-prep course is a structured, domain-mapped blueprint for the Google Cloud Professional Data Engineer certification (exam code GCP-PDE). You’ll learn how Google expects you to think: making sound engineering trade-offs, selecting the right managed services, and operating data workloads reliably at scale. The course emphasizes practical decision-making around BigQuery, Dataflow, and end-to-end analytics and ML workflows.
If you’re new to certification exams, Chapter 1 walks you through the exam experience—registration, question styles, pacing, and how to study efficiently—so you can avoid common pitfalls and focus on what moves the score.
The curriculum is organized as a 6-chapter “book” that maps directly to Google’s official GCP-PDE domains:
Chapters 2–5 go deep into the skills tested across these domains, using the same kinds of scenarios you’ll see on the real exam: ambiguous requirements, competing priorities (cost vs. latency vs. reliability), and constraints like security, governance, and operational readiness.
This course is designed to build confidence from the ground up. You’ll start with how to interpret objectives and case-based prompts, then progress into architecture patterns and service selection for common data engineering outcomes (batch ETL, streaming pipelines, lake/warehouse designs, and ML-enabled analytics).
Each content chapter includes exam-style practice milestones to reinforce the objective behind each decision. Chapter 6 finishes with a full mock exam split into two timed parts, followed by weak-spot analysis and a final review checklist—so you can close gaps quickly and walk in with a plan.
Enroll and begin with the exam orientation chapter, then follow the study plan through the domain chapters and mock exam. If you want to start right away, use Register free. To compare other certification tracks, you can also browse all courses.
By the end, you’ll be able to design, build, and operate Google Cloud data workloads in a way that aligns to what the GCP-PDE exam rewards: correct architectures, defensible trade-offs, and production-ready execution.
Google Cloud Certified Professional Data Engineer Instructor
Priya Nair is a Google Cloud certified Professional Data Engineer who has coached learners through exam-focused architecture and data pipeline design. She specializes in BigQuery, Dataflow, and operationalizing ML and analytics workflows on Google Cloud with production-grade reliability and cost control.
This chapter is your launchpad: how the Google Professional Data Engineer (GCP-PDE) exam is structured, how to book it without surprises, how scoring and retakes really work in practice, and how to build a 4-week plan that blends reading, hands-on labs, and exam-style practice. Treat this like a project plan—because the exam rewards engineers who can translate business requirements into secure, reliable, cost-aware data systems on Google Cloud.
The PDE exam is not a tool trivia test. It measures judgment: choosing the right ingestion pattern (batch vs streaming), the right storage and schema strategy (warehouse vs lake vs operational analytics), governance and security defaults, and operations (monitoring, incident response, cost control). Throughout this chapter, you’ll see how each topic connects back to the course outcomes: design, ingest/process, store, analyze/ML, and operate.
Exam Tip: In every question, first identify the primary constraint (latency, cost, compliance, reliability, simplicity, or time-to-market). The correct answer is usually the one that satisfies the constraint with the fewest moving parts—while aligning with Google Cloud “managed service” best practices.
Practice note for Understand the GCP-PDE exam format and question styles: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Registration, scheduling, and test-day rules (online/on-site): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Scoring, results, and retake strategy: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Build a 4-week study plan with labs, notes, and practice cadence: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Understand the GCP-PDE exam format and question styles: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Registration, scheduling, and test-day rules (online/on-site): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Scoring, results, and retake strategy: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Build a 4-week study plan with labs, notes, and practice cadence: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Understand the GCP-PDE exam format and question styles: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Registration, scheduling, and test-day rules (online/on-site): document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
The GCP-PDE exam evaluates end-to-end data engineering on Google Cloud, with an emphasis on architectural trade-offs. Your mental model should map every question to a lifecycle: ingest → process → store → serve → govern → operate. This aligns directly to the course outcomes: designing systems to business/technical requirements, building batch and streaming pipelines, selecting storage/schema, preparing data for analysis and ML, and maintaining workloads for reliability, security, and cost.
Expect a mix of direct multiple-choice and scenario/case-based items. Scenario questions often include business context (e.g., “regulatory requirements,” “global users,” “near real-time dashboards,” “data scientists need feature tables”) and technical context (data volume, velocity, schema evolution, SLAs). The exam favors native, managed services: BigQuery for analytics, Dataflow for pipeline execution, Pub/Sub for streaming ingestion, Cloud Storage for durable landing zones, and Dataproc only when Spark/Hadoop compatibility is explicitly required.
Common trap: Over-engineering. Many distractors add unnecessary components (e.g., Dataproc + Kafka + custom orchestration) when Pub/Sub + Dataflow + BigQuery meets the requirement. Another trap is ignoring governance—if the scenario mentions PII, audits, or data residency, answers that omit IAM boundaries, CMEK, VPC Service Controls, or policy controls are often wrong even if the pipeline “works.”
Registration is a reliability exercise: remove avoidable test-day risk. You’ll schedule through Google’s testing provider (online proctored or test center). Build in time for system checks, ID matching, and policy compliance. The exam experience can be derailed by simple issues like a mismatch between your legal name and your account profile, unsupported OS/browser, or an unsuitable testing environment.
For online proctoring, you will typically complete a check-in: photos of your ID, your face, and your testing area. Your desk must be clear, and you may be asked to show the room via webcam. For test centers, arrive early; lockers are used for personal items and rules are strictly enforced.
Exam Tip: Use the exact name on your government-issued ID when registering. If you have multiple accounts (personal/work), pick one and standardize early—last-minute changes create avoidable rescheduling delays.
Common trap: Treating the online exam like an open-book lab. Any attempt to consult documentation, a second device, or even reading aloud may violate rules. Plan your comfort breaks and hydration strategy before check-in.
Scheduling strategy: pick a date that completes a full revision loop (content → labs → practice) and leaves a buffer week for remediation. If you routinely work nights, do not schedule an early-morning slot—cognitive performance matters more than calendar convenience.
Google does not typically disclose a detailed numeric breakdown by domain during the exam, so your goal is broad competence, not perfect recall in one area. Performance expectations are aligned to a practicing data engineer: you should be able to choose architectures, reason about trade-offs, and troubleshoot operational issues. Passing requires consistent decision quality across domains—especially on scenario questions that blend multiple skills (e.g., streaming ingestion plus governance plus cost controls).
Interpret scoring as “best answer” under constraints. Many options can be technically feasible; the exam rewards the choice that is most appropriate given requirements. This is why practice should focus on justification, not memorization.
Exam Tip: When two answers both solve the problem, choose the one that is more managed, more secure by default, and requires less operational overhead—unless the scenario explicitly demands control (custom networking, bespoke runtime, strict data locality).
Retake strategy should be data-driven. After results, write a short “incident report” like you would at work: what domains felt weak, which service comparisons caused confusion, and where you ran out of time. Your remediation plan should include (1) one focused reading pass, (2) two hands-on labs, and (3) a new set of scenario questions, then repeat.
Case-based questions are the signature of the PDE exam. They read like short design reviews: a company context, a current state, pain points, and constraints. Your job is to recommend the next step or best architecture change. The key skill is extracting requirements from narrative—often a single phrase (“near real time,” “least operational overhead,” “must support late-arriving events,” “auditability required”) determines the correct service and configuration.
Use a consistent approach:
Common trap: Answering with a product instead of a design. For example, “use BigQuery” is incomplete if the scenario demands cost control and query performance; you should think “partition by event_date, cluster by customer_id, enforce dataset IAM, and use scheduled queries or Dataform/Composer for orchestration.” The exam often tests whether you know the configuration lever that makes the design succeed.
Exam Tip: Watch for “migration” language. If the prompt says “minimize refactoring,” prioritize lift-and-shift-friendly options (Dataproc for Spark jobs, BigQuery external tables temporarily) while still steering toward a managed end state.
Finally, beware of distractors that sound modern but don’t match the requirement. Vertex AI may appear as an attractive option, but if the question is about SQL-first analytics and simple models, BigQuery ML may be more appropriate. Conversely, if the scenario emphasizes MLOps pipelines and model monitoring, answers that stay purely in BigQuery can be insufficient.
Your study plan must include labs because PDE questions frequently hinge on “gotchas” you only learn by doing: streaming window behavior, BigQuery partition pruning, IAM permission boundaries, and operational metrics. Labs convert abstract service descriptions into instinctive decision-making, which is what the exam rewards.
Prioritize three lab tracks that map to the most tested patterns:
Exam Tip: In labs, force failure modes. Break IAM on purpose (remove a role), introduce malformed events, and observe retries/Dead Letter patterns. The exam often asks what to do when pipelines fail, not just how to build them.
Common trap: Treating Dataflow as “just Apache Beam.” The exam expects you to leverage managed features: autoscaling, monitoring, templates, flex templates, regional placement, and integration patterns. Similarly, for BigQuery, it’s not enough to know SQL—you must know performance levers (partitioning/clustering, materialized views) and governance levers (CMEK, DLP, policy tags).
Cadence recommendation: two focused labs per week, each ending with a one-page “lab note” that captures what you configured, why, and what you’d change for cost or reliability. Those notes become your revision asset in week 4.
A 4-week plan works when it is iterative: learn → apply → test → remediate. Your goal is to build a decision framework, not a pile of facts. Structure your weeks around the exam’s real skill: selecting the right architecture under constraints.
4-week plan (recommended cadence):
Exam Tip: Use a revision loop: after each practice session, categorize misses as (1) misunderstood requirement, (2) wrong service choice, (3) right service but wrong configuration, or (4) time pressure. Category (3) is the most common for PDE and the easiest to fix with a short lab.
Time management is a skill you can train. Practice reading prompts quickly, extracting constraints, and eliminating wrong answers. A reliable technique is to verbalize (silently) a one-sentence requirement summary—“near real-time analytics with PII, minimal ops”—then check each option against that sentence.
Common trap: Consuming endless resources without closing the loop. Limit your sources to a small set (official docs, a lab platform, and one practice engine), and spend more time on post-mortems than on new material. That is how you convert study time into exam points.
1. You are beginning a 4-week preparation plan for the Google Professional Data Engineer exam. You have limited time and want to maximize score impact by practicing the same kind of judgment the exam measures. Which approach BEST aligns with the exam’s intent?
2. A candidate is deciding how to approach each exam question on test day. They often get stuck comparing multiple technically viable architectures. Based on recommended exam strategy, what should they do FIRST when reading each question?
3. A company requires a single attempt to be completed without surprises on test day. The candidate is choosing between online proctoring and a test center. Which preparation step is MOST likely to prevent a failure unrelated to technical knowledge?
4. After taking the PDE exam, a candidate receives a 'fail' result and wants to retake quickly. They ask how to improve efficiently rather than repeating the same study approach. What is the BEST retake strategy aligned with how the exam measures skills?
5. You are designing a 4-week study plan for a busy engineer preparing for the PDE exam. The engineer has 6–8 hours per week and tends to forget material without reinforcement. Which cadence BEST matches a realistic certification-prep plan described in the chapter?
Domain 1 of the Professional Data Engineer exam repeatedly tests whether you can convert ambiguous business needs into concrete, supportable architectures on Google Cloud. The exam is less interested in whether you can name every service feature and more interested in whether you can choose the right pattern (batch, streaming, or hybrid), enforce governance, and justify trade-offs around reliability, security, and cost.
This chapter aligns to four recurring exam tasks: (1) translate business goals into data architecture decisions, (2) select GCP services for batch/streaming/hybrid designs, (3) design for security, compliance, and data governance, and (4) evaluate trade-offs in architecture scenarios. Expect questions that provide constraints (latency, regulatory boundaries, existing tools, operational maturity) and ask for the “best” solution—not just a possible one.
Exam Tip: When multiple answers could work, the correct choice typically best satisfies stated constraints with the fewest moving parts and the lowest operational burden, while still meeting security and reliability requirements.
Practice note for Translate business goals into data architecture decisions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Select GCP services for batch, streaming, and hybrid designs: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Design for security, compliance, and data governance: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Exam-style practice set: architecture and trade-off scenarios: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Translate business goals into data architecture decisions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Select GCP services for batch, streaming, and hybrid designs: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Design for security, compliance, and data governance: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Exam-style practice set: architecture and trade-off scenarios: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Translate business goals into data architecture decisions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Select GCP services for batch, streaming, and hybrid designs: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Design for security, compliance, and data governance: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
The first step in nearly every Domain 1 scenario is requirements analysis. The exam expects you to distinguish business goals (what the company values) from technical constraints (what the system must obey). Business goals commonly include faster insights, personalization, regulatory compliance, and cost control. Technical constraints often include latency SLOs, data volume/velocity, schema variability, regional residency, RPO/RTO targets, and integration with existing systems.
A practical approach is to translate narrative requirements into a decision table: ingestion pattern (batch vs streaming), storage system of record, processing engine, serving layer, governance controls, and operational model. For example, “near real-time fraud detection” implies streaming ingestion and low-latency processing; “monthly financial reporting with auditability” implies batch pipelines, immutable raw storage, and strong lineage controls.
Common trap: Overfitting the solution to a favorite tool. If the prompt emphasizes “minimal ops,” a Dataproc cluster—even if technically feasible—is usually a wrong direction compared to Dataflow/BigQuery managed approaches.
Exam Tip: Look for “must,” “cannot,” and “existing investment” phrases. “Must remain in EU,” “cannot expose data to public internet,” or “already uses Spark jobs” are constraint anchors that should drive the architecture choice.
The exam commonly frames architectures as data lake, data warehouse, or lakehouse, but expects you to map these to Google Cloud implementations. A data lake on GCP typically uses Cloud Storage as the durable, low-cost system of record, with raw/bronze data stored immutably and curated/silver data produced by batch or streaming pipelines. A data warehouse pattern centers on BigQuery as the primary analytical store, emphasizing governed schemas, strong SQL analytics, and performance at scale.
A lakehouse blends both: Cloud Storage retains raw files while BigQuery provides warehouse-grade analytics over both loaded and externally referenced data (for example, BigQuery external tables over Cloud Storage), sometimes with an incremental curation layer. The test often rewards solutions that separate concerns: raw landing (immutable), transformation/curation (repeatable), and consumption (modeled, governed datasets).
Common trap: Assuming a lake automatically implies “no governance.” On the exam, a lake still needs cataloging, access controls, and lifecycle policies; otherwise, the architecture is incomplete for compliance-heavy prompts.
Exam Tip: If the scenario stresses “single source of truth for analytics,” “business metrics,” or “BI tools,” BigQuery-centric warehouse/lakehouse answers typically fit better than Storage-only lake answers.
This domain tests whether you can choose the right managed service for ingestion, processing, and storage—especially in batch, streaming, and hybrid designs. A common exam pattern is to describe an end-to-end pipeline and ask which set of services best meets latency, scalability, and operational requirements.
Pub/Sub is the default for scalable streaming ingestion and decoupling producers/consumers. Pair it with Dataflow (Apache Beam) for event-time processing, windowing, enrichment, and exactly-once-like semantics in many sink patterns. Cloud Storage is the landing zone for raw files and long-term retention, and it integrates well with both batch processing and replay strategies.
BigQuery is the analytics workhorse: managed storage + compute separation, SQL, partitioning/clustering, and native integrations (BI Engine, Dataform, Data Catalog integration patterns). For transformation-heavy SQL pipelines, BigQuery is often the simplest “fewest moving parts” option.
Dataproc is appropriate when you need Spark/Hadoop ecosystem compatibility, existing job portability, or specialized libraries—at the cost of more operational considerations (cluster sizing, job orchestration, dependency management). In exam scenarios, Dataproc often becomes correct when the prompt explicitly mentions Spark, HDFS/Hive, or migration of existing on-prem Hadoop workloads.
Common trap: Picking Dataproc “because it’s flexible” when the requirement is “serverless/managed” and the tasks are straightforward ETL/ELT. Another trap is ignoring the need for a raw replayable store; streaming-only pipelines without a durable landing zone can fail auditability and backfill requirements.
Exam Tip: When you see event-time, late data, or windowed aggregations, Dataflow streaming is usually the intended processing engine. When you see ad hoc analytics, star schemas, and BI, BigQuery is usually the intended serving layer.
Reliability is a frequent differentiator between “works in a demo” and “passes an exam scenario.” Domain 1 expects you to design pipelines that handle duplicates, partial failures, spikes, and regional outages. Two key ideas: make operations idempotent (safe to repeat) and design for bounded failure (retries that don’t amplify the problem).
Idempotency means reprocessing the same message/file does not corrupt results. In streaming, you often achieve this via stable event IDs and upsert/merge patterns in sinks, or by writing to partitioned tables and running deterministic aggregations. In batch, it can mean writing outputs to new partitions and swapping pointers (or using atomic load/replace patterns) rather than overwriting in-place.
Retries should be exponential backoff with jitter where possible, and you should separate transient errors (retry) from permanent errors (dead-letter handling or quarantine bucket). Backpressure matters when ingestion outpaces processing: Pub/Sub buffering plus autoscaling Dataflow workers is a common pattern, but you still need to consider downstream sink limits (BigQuery load/streaming quotas, API quotas).
Disaster recovery (DR) is framed through RPO/RTO. Multi-region storage choices, cross-region replication strategy, and the ability to replay from Cloud Storage are often the practical DR mechanisms for data pipelines. For analytics, BigQuery dataset location decisions and backup/export strategies may be part of the story depending on compliance constraints.
Common trap: Treating “at-least-once delivery” as “exactly-once results.” Pub/Sub can deliver duplicates; the design must tolerate them. Another trap is proposing a global active-active pattern when the prompt only asks for modest RPO/RTO—overengineering can be scored as poor fit.
Exam Tip: If the prompt mentions “must not double count” or “financial accuracy,” explicitly think idempotent writes, de-duplication keys, and replay strategies.
Security and governance are not separate from architecture; they are architecture. The exam expects you to embed controls into service selection and data flows. Start with IAM: least privilege roles at the project, dataset, and bucket level; separate service accounts for pipelines; and avoid broad primitive roles. For BigQuery, consider dataset-level permissions and authorized views to restrict sensitive columns while enabling analytics.
VPC Service Controls (VPC-SC) is a frequent “enterprise boundary” answer when the prompt mentions exfiltration risk, regulated data, or restricting API access to only corporate networks. It’s often paired with Private Google Access and controlled perimeters to reduce data movement risk across projects and services.
Customer-managed encryption keys (CMEK) appear when compliance demands key ownership and rotation control. On the exam, CMEK is typically the right add-on when the prompt explicitly requires customer-controlled keys, separation of duties, or centralized key management with Cloud KMS.
Data Loss Prevention (DLP) is relevant for discovery, classification, tokenization/masking, and detection of sensitive data (PII/PHI). Architecturally, DLP can be integrated into ingestion (scan before load), governance workflows (classification tags), and data sharing patterns (masking before publishing).
Common trap: Suggesting network controls (VPC firewall rules) as the primary method to prevent managed-service data exfiltration. For many Google APIs, VPC-SC is the correct conceptual control in exam scenarios, not just firewalling.
Exam Tip: If the scenario includes “PII,” “HIPAA,” “PCI,” or “data exfiltration,” look for answers that combine least-privilege IAM with perimeter controls (VPC-SC) and key management requirements (CMEK) rather than only one control.
The exam often disguises cost questions as architecture questions: “optimize,” “reduce spend,” “meet SLAs,” or “handle spikes.” You must connect workload shape to pricing levers. In BigQuery, think in terms of query cost (bytes processed), performance controls (partitioning/clustering/materialized views), and compute model (on-demand vs reservations/slots). Reservations can stabilize spend and performance for predictable workloads; on-demand can be simpler for spiky or low-volume workloads.
For processing, Dataflow autoscaling reduces manual capacity management, but you should still consider worker type, streaming engine choices, and the cost of always-on streaming jobs. Dataproc can be cost-effective for bursty batch if you use ephemeral clusters and preemptible/spot VMs where appropriate, but it adds operational overhead and can become expensive if clusters are left running.
Storage choices are a classic trade-off area. Cloud Storage classes (Standard, Nearline, Coldline, Archive) should match access frequency and retention policies. Lifecycle rules are a cost-control feature the exam expects you to apply for long-lived raw data. In BigQuery, long-term storage pricing and partition expiration policies can also be relevant when retention is mandated but query access is rare.
Common trap: Treating “faster” as always “more expensive is fine.” Many exam prompts require meeting an SLA while minimizing cost. Another trap is ignoring partitioning/clustering, leading to large scan costs that violate “optimize cost” requirements.
Exam Tip: If the scenario mentions predictable daily reporting, reservations/slots and scheduled transforms are often appropriate. If it mentions unpredictable ad hoc exploration, emphasize partitioning/clustering and on-demand simplicity—unless the prompt explicitly requires cost predictability.
1. A retailer wants to reduce cart abandonment by triggering personalized offers within 2 seconds of a user event. Events arrive from web and mobile clients at variable volume. The solution must be managed (minimal ops) and support exactly-once processing semantics for downstream analytics. Which architecture best meets the requirements on Google Cloud?
2. A finance company must keep all customer PII in a specific region and ensure only a small compliance team can decrypt it. Data engineers still need to run aggregations over the dataset in BigQuery. Which design best satisfies compliance and least-privilege requirements?
3. A media company currently runs a nightly batch ETL that produces a curated dataset for reporting. Leadership now wants near-real-time dashboards (under 1 minute) while keeping the existing nightly batch reconciliation for accuracy. Which approach is the best fit?
4. Your organization has multiple teams publishing datasets to a central analytics project. You must ensure consistent data classification (PII vs non-PII), prevent unauthorized access at the column level, and provide an auditable governance model with minimal custom code. Which solution best meets these goals?
5. A startup needs a cost-effective batch pipeline to process 10 TB of CSV logs daily. Processing can take up to 6 hours, and the team wants minimal cluster management. The output should be queryable with standard SQL. Which design is most appropriate?
Domain 2 is where the Professional Data Engineer exam stops being “which service does what?” and starts testing whether you can design reliable, cost-aware, and correct data movement and transformation systems. Expect scenario questions that mix business constraints (SLA, freshness, compliance, cost) with technical constraints (ordering, deduplication, schema drift, backfills, late data). The exam frequently evaluates whether you can choose the right ingestion pattern (files vs events vs CDC), the right processing mode (batch vs streaming), and the right operational posture (replayability, monitoring, error isolation).
This chapter connects the major ingestion and processing tools—Storage Transfer Service, Datastream, Pub/Sub, Dataflow/Beam, Dataproc/Spark, and BigQuery—into decision frameworks you can apply under exam time pressure. The exam also tests your understanding of streaming semantics (windows/triggers/watermarks), operational controls (dead-letter queues, retries, backfills), and performance tuning (shuffle, autoscaling, fusion, and pipeline metrics). When in doubt, anchor your answer to two questions: “What guarantees does the workload require?” and “Where is the state maintained?”
Exam Tip: Most wrong answers are “plausible” because they move data. Pick the option that meets the stated guarantees (freshness, ordering, dedupe, schema management, and replay) with the fewest moving parts and the clearest operational story.
Practice note for Implement ingestion patterns for files, events, and CDC: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Build streaming pipelines with Pub/Sub and Dataflow: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Build batch pipelines with Dataflow, Dataproc, and BigQuery: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Exam-style practice set: pipeline correctness and performance: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Implement ingestion patterns for files, events, and CDC: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Build streaming pipelines with Pub/Sub and Dataflow: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Build batch pipelines with Dataflow, Dataproc, and BigQuery: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Exam-style practice set: pipeline correctness and performance: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Implement ingestion patterns for files, events, and CDC: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Build streaming pipelines with Pub/Sub and Dataflow: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
The exam expects you to map ingestion patterns to source characteristics: files (finite, often large), events (unbounded, near-real-time), and CDC (ordered changes from databases). For file ingestion into Cloud Storage, Storage Transfer Service is a common “correct” choice because it is managed, supports scheduled transfers, and handles retries for large object sets. Scenarios mentioning “move from S3 to GCS,” “recurring nightly transfers,” or “minimal operational overhead” usually point to Storage Transfer rather than custom VMs or ad-hoc scripts.
For CDC, Datastream is the primary managed service on GCP. Use it when the prompt mentions “replicate database changes,” “low-latency updates,” “transaction logs,” or “keep BigQuery in sync.” Datastream typically lands into Cloud Storage and/or BigQuery via downstream processing. A trap is choosing Pub/Sub for CDC directly: Pub/Sub is an event bus, not a log-based CDC extractor. If you already have CDC events produced by a tool, Pub/Sub can transport them; otherwise, Datastream is the intended source-side capture.
For event ingestion, Pub/Sub is the default for decoupled, scalable ingestion with fan-out and backpressure handling. It shines when producers and consumers evolve independently, and when you need multiple subscribers (e.g., one pipeline to BigQuery, another to alerting). Managed connectors (e.g., Dataflow templates/connectors, BigQuery Data Transfer Service, Datastream) appear in exam scenarios as the “simplify operations” option. If the question emphasizes time-to-market and managed ops, lean toward native connectors/templates over custom code.
Exam Tip: If the scenario mentions “once per day/hour,” “files,” and “large backfills,” think transfer services and batch pipelines. If it mentions “near real-time,” “unbounded,” “late events,” or “event-time,” think Pub/Sub + streaming Dataflow. If it mentions “transaction logs” or “database replication,” think Datastream.
Dataflow is Google’s managed runner for Apache Beam, and the exam frequently probes your knowledge of Beam concepts because they determine correctness in streaming. Transforms (ParDo, GroupByKey/Combine, Flatten, CoGroupByKey) define your computation graph. The key mental model: when you group or aggregate, you introduce state and often a shuffle—this affects both cost and latency.
Streaming correctness hinges on event time vs processing time. Beam windows assign elements to finite buckets (fixed/tumbling, sliding, session) so you can aggregate an unbounded stream. Triggers decide when to emit results for a window (e.g., after watermark, after processing-time delay, or repeatedly). Watermarks are the system’s estimate of event-time progress; late data is anything arriving after the watermark has passed the window end (subject to allowed lateness). On the exam, scenarios with “out-of-order events,” “mobile telemetry,” or “late-arriving transactions” are testing whether you choose event-time windowing with appropriate allowed lateness and triggering, rather than naive processing-time aggregations.
Common trap: selecting “exactly once” language without understanding what is actually guaranteed. Dataflow offers strong processing guarantees in many sinks (notably BigQuery Storage Write API), but duplicates can still appear if you don’t design idempotency keys for your domain or if your sink lacks transactional semantics. Another trap is ignoring window accumulation mode: discarding vs accumulating panes changes whether downstream sees incremental updates or final results.
Exam Tip: When a prompt says “must be correct even with late events,” look for event-time windows + allowed lateness + a trigger strategy. When it says “low latency dashboards,” look for early triggers and be ready to accept updated results.
Batch pipelines are still heavily tested because many enterprises run scheduled ETL/ELT with backfills, regulatory reprocessing, and large joins. Dataflow can run Beam in batch mode and is often the “managed” answer when you need minimal cluster ops and a single model for batch and streaming. Dataflow templates (including Flex Templates) matter for standardization: they package pipeline logic for repeatable execution, parameterization, and safer promotion across environments. If the scenario mentions “data engineers should run this daily with different parameters,” “CI/CD,” or “avoid redeploying code,” templates are a strong signal.
Dataproc (managed Hadoop/Spark) is typically chosen when the prompt requires Spark-specific libraries, existing Spark code, HDFS/Hive compatibility, or tight control over cluster configuration. The trade-off is operational overhead (cluster lifecycle, dependency management, tuning executors) versus flexibility. BigQuery also appears as a batch engine: if transformations are mostly SQL (joins, aggregations) and data is already in BigQuery, an ELT approach using scheduled queries or SQL pipelines can be simplest and fastest. The exam often rewards pushing work to BigQuery when it reduces data movement.
Traps: choosing Dataproc “because it’s faster” without a stated Spark dependency; or choosing Dataflow for workloads that are primarily ad-hoc SQL in BigQuery. Another common misread is ignoring data locality: massive datasets sitting in BigQuery are usually best processed in BigQuery, not exported to GCS just to run Spark.
Exam Tip: If the question highlights “existing Spark jobs,” “Scala/PySpark,” or “need Hive metastore,” Dataproc is likely. If it highlights “managed pipeline,” “same logic for streaming later,” or “template-based operations,” Dataflow is likely.
The exam expects you to treat data quality as part of the pipeline design, not an afterthought. Ingest and process questions often embed quality requirements like “reject malformed records,” “quarantine bad rows,” “enforce referential integrity,” or “detect anomalies.” A correct design typically separates: (1) validation (schema/type/range checks), (2) standardization (normalizing timestamps, IDs), and (3) enrichment (lookups, joins). In Dataflow, validation is commonly implemented with side outputs (tagged outputs) so good records proceed while bad records are routed to quarantine storage for analysis.
Schema evolution is a frequent real-world pain point and a subtle exam discriminator. For event streams, a schema registry pattern (often using Avro/Protobuf/JSON schema stored centrally) helps producers and consumers evolve safely. In BigQuery, schema relaxation (NULLABLE additions) is easier than breaking changes (type changes, required fields). When processing CDC, you also need to consider DDL changes: adding columns should be handled without breaking downstream transforms; dropping/renaming columns often requires versioning.
Watch for traps around “auto-detect schema” in production pipelines. Autodetect can be acceptable for exploratory loads, but for governed pipelines the exam tends to favor explicit schemas and controlled evolution. Another trap is ignoring partitioning/clustering compatibility when new fields appear—adding a partitioning column later can force a table redesign.
Exam Tip: When you see “must not lose data,” the right answer often includes quarantining invalid records (not dropping) and storing enough metadata to replay after fixes (offsets, message IDs, file names).
Operational resilience is heavily tested in Domain 2: how your pipeline behaves when data is malformed, sinks are unavailable, or processing code changes. Pub/Sub plus Dataflow commonly uses a dead-letter queue (DLQ) topic for messages that fail parsing/validation after retries. This is different from transient failures (e.g., temporary BigQuery outage), which should be handled with retry policies and backoff rather than immediately DLQ’ing.
Replay strategy differs by ingestion type. For Pub/Sub, replay typically means re-consuming from a retained subscription (or using seek with snapshots where applicable) and ensuring your pipeline can handle duplicates. For files in GCS, replay means re-running a batch job over a known input prefix and writing outputs idempotently (e.g., overwrite partition for a date). For CDC, replay/backfill may be supported by the CDC tool (e.g., Datastream backfill) but downstream must still be idempotent and ordered where required.
The exam often uses “exactly-once” as a trap. In distributed systems, end-to-end exactly-once usually requires idempotent writes or transactional sinks. Many designs are effectively at-least-once with deduplication using a unique key (event ID) and a time-bounded state store. In BigQuery streaming writes, duplicates can still occur across retries unless you use mechanisms designed for dedupe (e.g., insertId in legacy streaming or appropriate semantics with Storage Write API) and your data model can reconcile duplicates.
Exam Tip: If the prompt says “no data loss” and “must continue processing,” the best answer usually includes: DLQ for bad records, durable storage for raw inputs, and a replay plan that won’t double-count.
Performance questions on the PDE exam are rarely about micro-optimizations; they focus on identifying bottlenecks (shuffle, hot keys, slow sinks) and choosing the right knobs (autoscaling, batching, reshuffle, windowing strategy). In Dataflow, shuffle-heavy stages appear when grouping/aggregating/joining. Hot keys (skew) can dominate runtime; mitigation patterns include key salting, combiners, or redesigning to pre-aggregate.
Fusion is Dataflow’s optimization that combines compatible transforms to reduce overhead. While usually beneficial, it can create memory pressure or reduce parallelism in certain patterns; adding a Reshuffle (or using a shuffle boundary) can increase parallelism and stabilize throughput. Autoscaling helps handle variable load, but it doesn’t fix fundamental bottlenecks like a sink quota limit or a single-threaded DoFn. For BigQuery sinks, consider batch loads (for batch pipelines) or Storage Write API (for higher-throughput streaming) and watch quotas/partitioning.
Pipeline metrics and monitoring are exam-relevant because they indicate whether you can diagnose issues. Key signals: system lag (watermark lag), throughput, backlogged bytes in Pub/Sub, worker CPU/memory, and per-step latency. A frequent trap is scaling workers when the real constraint is downstream (e.g., BigQuery quota) or upstream (Pub/Sub publish rate). The correct answer references observing metrics first, then tuning the right stage.
Exam Tip: If the scenario says “streaming pipeline falling behind,” look for watermark lag and Pub/Sub backlog, then identify whether the bottleneck is a shuffle (aggregation/join) or the sink. “Add more workers” is only correct when the bottleneck is parallelizable and not quota-limited.
1. A retailer needs to ingest daily partner files (CSV) from an external SFTP server into BigQuery. Files arrive once per day, may be re-sent if corrupted, and the solution must be low-ops and replayable. Which approach best meets the requirements?
2. A company streams user click events to Pub/Sub. They need per-minute counts by campaign with the following requirements: (1) allow late events up to 30 minutes, (2) produce early results every minute, (3) ensure the final result is correct after the lateness period. Which Dataflow/Beam configuration best matches these requirements?
3. A fintech wants near-real-time replication from an on-premises PostgreSQL database into BigQuery for analytics. Requirements: capture inserts/updates/deletes, handle schema changes with minimal manual work, and keep operational overhead low. Which architecture is most appropriate?
4. A team has a daily 3 TB ETL job that reads from Cloud Storage, performs heavy joins and aggregations, and writes curated tables to BigQuery. They want to minimize pipeline time and cost while keeping operations simple. Which option is the best fit?
5. A streaming Dataflow pipeline reads from Pub/Sub and writes to BigQuery. The team observes occasional duplicate rows in BigQuery during worker restarts and transient sink errors. They need to reduce duplicates while maintaining throughput. What should they do?
Domain 3 of the Google Professional Data Engineer exam is where architecture meets day-2 reality: you can ingest data perfectly, but if you store it in the wrong system (or with the wrong schema, retention, or controls), you will fail the business requirements—and likely the exam scenario as well. The exam expects you to choose storage based on access patterns (OLTP vs analytics), latency, scale, consistency, governance, and operational constraints, then to model and secure that data with BigQuery-first design principles.
This chapter maps directly to the “Store the data” responsibilities: selecting the right storage service, modeling in BigQuery for performance and governance, implementing lifecycle/retention and access patterns, and applying security and durability controls. The most common exam trap is treating storage choices as purely “feature matching.” Instead, the exam wants you to recognize intent: what queries will run, how frequently, by whom, with what SLAs and compliance requirements. If a prompt mentions ad-hoc analytics, cost-per-query, partition pruning, or BI tools, you should think BigQuery modeling and optimization. If it mentions millisecond reads/writes at massive scale, you should pivot to Bigtable or Spanner, and possibly land raw data in Cloud Storage for audit and reprocessing.
Exam Tip: In scenario questions, underline the non-functional requirements (latency, throughput, consistency, retention, cost). Those details usually eliminate 2–3 options immediately.
Practice note for Choose the right storage system for analytics and operational needs: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Model data for BigQuery performance and governance: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Implement lifecycle, retention, and access patterns: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Exam-style practice set: storage and schema decisions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Choose the right storage system for analytics and operational needs: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Model data for BigQuery performance and governance: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Implement lifecycle, retention, and access patterns: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Exam-style practice set: storage and schema decisions: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Choose the right storage system for analytics and operational needs: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
The exam frequently tests whether you can separate analytical storage from operational storage. BigQuery is Google’s serverless analytical data warehouse: columnar storage, massively parallel execution, and a pricing model tied to storage and query processing. Cloud Storage (GCS) is object storage for files/blobs and is the default landing zone for raw data, archives, and reprocessing pipelines. Bigtable is a wide-column NoSQL database optimized for very high throughput and low-latency key/value access patterns (time-series, IoT, ad tech). Spanner is globally distributed relational OLTP with strong consistency and horizontal scalability. Cloud SQL is managed MySQL/PostgreSQL/SQL Server for traditional relational workloads that fit within single-region scaling and familiar engine constraints.
To choose correctly, focus on access pattern and concurrency. If users need complex joins, aggregations, and ad-hoc BI at scale, BigQuery is the “happy path.” If you need cheap, durable storage for many file formats (Avro/Parquet/ORC/CSV/JSON) and to decouple compute from storage, GCS is ideal. If the prompt mentions “millisecond latency,” “high QPS,” “single-row lookups,” or “time-series keyed by device + timestamp,” Bigtable is usually correct. If it mentions global transactions, relational constraints, or multi-region write availability with SQL semantics, Spanner is the answer. If it mentions lift-and-shift from an existing relational engine, smaller scale OLTP, or compatibility with MySQL/Postgres features, Cloud SQL is likely.
Exam Tip: Bigtable is not a “cheap BigQuery” and BigQuery is not an OLTP database. The exam penalizes mixing these: if you see transactional updates and referential integrity needs, don’t pick BigQuery just because it’s SQL.
Common trap: choosing Cloud Storage alone for analytics because it’s “cheap.” GCS is storage, not an analytics engine. The exam expects you to pair it with compute (BigQuery external tables, Dataproc/Spark, Dataflow) depending on query style and governance.
BigQuery modeling shows up in many PDE questions because it impacts performance, cost, and governance. The key building blocks are datasets (administrative containers), tables (managed storage), views (logical query definitions), and materialized views (precomputed, incrementally maintained results for specific query patterns). On the exam, datasets are often the unit for access control and data organization (e.g., separate raw, curated, and sandbox datasets). This is also where you’ll see location constraints: datasets are regional or multi-regional; you cannot query across locations without special handling, so “data residency” requirements are a strong signal.
Tables should be modeled with query patterns in mind. For event data, a common best practice is a “fact table” with repeated/record fields (nested schema) instead of excessive normalization. BigQuery supports nested and repeated fields efficiently and often reduces join cost. Views help enforce consistent business logic (e.g., masking, computed fields) without duplicating storage. However, views do not store results; they can increase query cost if used heavily in BI dashboards. Materialized views can accelerate repeated aggregations (e.g., daily rollups) while reducing compute, but they have limitations: the query must be compatible with incremental refresh rules, and not every transformation qualifies.
Exam Tip: If the scenario mentions “many users running the same dashboard queries,” look for materialized views or pre-aggregated tables. If it mentions “single source of truth logic” and “avoid duplication,” look for standard views—but remember they don’t inherently reduce query cost.
Common modeling traps tested on PDE: (1) creating too many small tables (operational habit) instead of partitioned/clustered large tables for analytics; (2) using views as a security boundary incorrectly (views can help, but use authorized views and policy controls properly); (3) ignoring dataset-level organization—mixing raw PII with curated analytics in the same dataset complicates permissions and auditing. Correct answers usually propose a layered approach: raw landing (immutable), curated (validated), and serving (optimized/masked), each with dataset-level controls.
Partitioning and clustering are among the highest-yield topics in Domain 3 because they directly affect query performance and cost. Partitioning splits a table into segments, typically by ingestion time or by a DATE/TIMESTAMP column. Clustering organizes data within partitions based on up to four columns, improving pruning for filters and improving performance for grouped access patterns.
On the exam, the “tell” for partitioning is any mention of time-based queries (“last 7 days,” “monthly reports,” “daily pipeline”), retention requirements, or rapidly growing fact tables. A correct design uses partitioning to reduce scanned bytes and to enable partition-level expiration. Clustering is indicated when queries filter or group by certain high-cardinality columns (e.g., customer_id, region, device_id) or when you need faster selective queries within large partitions.
Exam Tip: Partitioning reduces data scanned when queries include a partition filter. If the prompt implies analysts often forget filters, consider enforcing partition filters (require_partition_filter) to prevent accidental full table scans—this is a classic cost-control answer choice.
How to identify the best exam answer: match the table’s dominant filter to partition key (often event_date), then choose clustering keys that appear in common WHERE predicates and JOIN keys. If the scenario mentions both time filtering and customer-level drilldowns, the best practice is usually “partition by date, cluster by customer_id (and possibly another dimension like region).” Also watch out for location: partitioning and clustering optimizations don’t fix cross-region dataset constraints.
The exam expects you to align ingestion method with freshness, cost, and correctness requirements. BigQuery supports batch load jobs (from GCS), streaming inserts, and external tables (query data in GCS without loading). Batch load jobs are the default for cost efficiency and consistency: they are suited for hourly/daily pipelines and large files (Avro/Parquet/ORC are preferred for schema and performance). Streaming inserts provide low-latency availability but come with tradeoffs: higher cost, quotas, and operational considerations (exactly-once semantics require careful design outside BigQuery). External tables are useful for exploratory analytics, data lake patterns, and when you want to avoid duplicating storage—but query performance and governance controls can differ compared to managed tables.
Exam Tip: If the scenario says “near real-time dashboards” or “seconds-level latency,” streaming (or Dataflow to BigQuery) is plausible. If it says “cost sensitive” and “daily reports,” prefer batch loads from GCS. If it says “keep data in the lake” and “query in place,” consider external tables.
The exam also tests format choice. Columnar formats (Parquet/ORC) generally reduce storage and improve scan efficiency; Avro is strong for row-based write patterns and schema evolution; CSV is the most error-prone (schema inference problems, escaping issues) and is often a trap option when reliability matters. For streaming, scenarios often pair Pub/Sub → Dataflow → BigQuery, where Dataflow handles windowing, deduplication, and schema normalization before writes.
Common trap: choosing external tables for heavy BI workloads because it “avoids loading time.” In practice, managed tables usually provide better performance, partitioning/clustering features, and more predictable cost. External tables are best when governance or operational constraints require data to remain in GCS, or for low-frequency queries and staging.
Governance is increasingly emphasized on the PDE exam: you must show you can make data discoverable, trustworthy, and compliant. Conceptually, think “metadata + lineage + policy.” In Google Cloud, Data Catalog concepts (and its modern equivalents in Dataplex/BigQuery metadata experiences) revolve around technical metadata (schemas, locations), business metadata (glossary-like tags), and searchable discovery. Lineage signals come from how data moves through pipelines (e.g., Dataflow jobs, BigQuery transformations, scheduled queries), and the exam may describe a need to trace “where did this field come from?” or “who changed this dataset?”
Policy controls show up as answers involving tagging and classification, least-privilege access, and separating duties between raw and curated zones. A common scenario: sensitive columns (email, SSN) must be tagged, access must be restricted, and analysts should only see masked outputs. The best solutions combine metadata classification (tags), consistent dataset structure (raw/curated/serving), and enforcement mechanisms (IAM + policy-based controls in BigQuery).
Exam Tip: If a prompt mentions “discoverability,” “data owners,” “business definitions,” or “search,” it’s a metadata/catalog problem—not a storage engine problem. Don’t answer with “create another table” when the requirement is governance.
Common traps: assuming governance is only documentation, or only IAM. The exam expects layered governance: define and classify, track movement (lineage), and enforce. Another trap is ignoring location and domain boundaries; governance becomes simpler when datasets align to domains (finance, marketing) with clear ownership and access policies.
Security and durability decisions often differentiate “acceptable” from “best” answers on the PDE exam. Start with IAM: grant least privilege at the right level (project, dataset, table, view) and prefer predefined roles when possible. For BigQuery, the exam may test your understanding that dataset-level permissions control who can read/write objects, while finer-grained controls include column-level security (policy tags) and row-level security (row access policies). These are designed for cases where multiple user groups query the same table but should see different subsets of data.
Exam Tip: If you see “analysts should query the same table but only see their region/customer segment,” row-level security is the signal. If you see “hide/mask PII columns,” column-level security via policy tags is the signal. If you see “publish a safe subset,” consider authorized views in addition to RLS/CLS.
Durability and recovery vary by service. Cloud Storage offers high durability and lifecycle policies; it’s your go-to for immutable raw archives and replay. BigQuery provides time travel and table snapshots for recovery within retention windows, and supports backups-like patterns via snapshots/copies for longer retention or change control. For Cloud SQL, backups and point-in-time recovery are first-class and frequently tested; for Spanner, backups and multi-region configurations address high availability and disaster recovery; for Bigtable, backups and replication address operational continuity.
Common trap: treating “backup” as identical across services. The exam expects service-specific mechanisms and an understanding of RPO/RTO. Another trap: granting broad roles (Owner/Editor) to solve access quickly; correct answers usually mention least privilege and separation (e.g., service accounts for pipelines, read-only roles for analysts, restricted access to raw datasets).
1. A media company needs to ingest millions of time-series events per second from IoT devices and serve sub-10 ms reads for the most recent device state. They also want to run ad-hoc analytics across all historical events. Which storage design best meets these requirements with minimal operational overhead?
2. A retailer has a 20 TB BigQuery table of clickstream events queried primarily by dashboards filtering on event_date and customer_id. Queries are slowing and costs are rising due to large scans. You need to improve performance and reduce bytes scanned without changing dashboard logic. What should you do?
3. A financial services company stores raw transaction files in a Cloud Storage bucket and curated datasets in BigQuery. Regulations require: (1) raw files retained for 7 years, (2) raw files must not be modified or deleted during retention, (3) access restricted to a small audit group. What is the best approach?
4. You are designing storage for a global inventory system that requires strongly consistent reads after writes and horizontal scale across regions. The system must support relational transactions and high availability with minimal application changes. Which Google Cloud service should you choose?
5. A company wants to share a subset of sensitive columns from a BigQuery dataset with analysts in another department. The analysts should be able to query only masked values for PII columns while still seeing non-PII fields. The solution must minimize data duplication and be centrally governed. What should you implement?
Domains 4 and 5 of the Google Professional Data Engineer exam test whether you can turn data into trustworthy analysis outcomes and then run those workloads reliably in production. The exam is not asking for “can you write SQL” or “can you start a DAG”—it’s testing if you can choose the right BigQuery patterns for BI performance, operationalize ML with the right level of governance, and automate/observe pipelines so they meet reliability and cost requirements.
This chapter connects three threads you must be able to reason about under exam pressure: (1) analytics enablement (curated layers, semantic models, KPI definitions), (2) performant BigQuery querying and BigQuery ML workflows, and (3) operations: orchestration, monitoring, incident response, and automation. Expect scenario questions where multiple solutions are technically possible, but only one aligns with business intent, security controls, and cost/latency constraints.
Exam Tip: In Domains 4–5, the “best” answer is usually the one that balances performance, governance, and operability. If an option improves speed but creates unmanaged duplication, unclear metric definitions, or brittle manual steps, it’s rarely the correct choice.
Practice note for Enable analytics and BI with performant BigQuery querying patterns: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Operationalize ML workflows using BigQuery ML and pipeline integrations: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Automate orchestration, monitoring, and incident response for data workloads: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Exam-style practice set: analytics, ML, and operations scenarios: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Enable analytics and BI with performant BigQuery querying patterns: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Operationalize ML workflows using BigQuery ML and pipeline integrations: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Automate orchestration, monitoring, and incident response for data workloads: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Exam-style practice set: analytics, ML, and operations scenarios: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Enable analytics and BI with performant BigQuery querying patterns: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Operationalize ML workflows using BigQuery ML and pipeline integrations: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
On the exam, “prepare and use data for analysis” often means you must distinguish raw ingestion from analytics-ready datasets. A common pattern is layered data: a raw/landing layer (immutable, minimal transforms), a curated layer (cleaned, conformed, deduplicated), and a presentation/serving layer (business-friendly tables or views). BigQuery is frequently the home for curated and serving layers because it supports governed access, performant SQL, and integration with BI tools.
Semantic modeling is the bridge between technical tables and business questions. You may see prompts about inconsistent metrics across teams (e.g., “active user,” “net revenue,” “churn”). The correct fix is rarely “create more dashboards”; it’s to define KPIs centrally (metric definitions, grain, filters, time windows) and publish them through a consistent semantic layer—often via standardized views, authorized views, or BI semantic models (e.g., Looker/LookML) backed by BigQuery.
Exam Tip: When the scenario mentions “multiple sources disagree,” “definitions vary,” or “executives see different numbers,” choose solutions that enforce metric contracts (semantic layer + curated datasets) rather than ad hoc fixes like copying tables per team.
Common trap: Confusing “data mart per team” with “semantic layer.” Duplicating fact tables for each group increases cost and creates drift. Prefer shared curated facts with governed, business-oriented views and row/column security where required.
BigQuery performance questions usually present a slow dashboard, an expensive query, or concurrency issues. You’re expected to recognize the levers: partitioning/clustering, join order and join types, materialized views, approximate aggregations, and resource management via reservations/slots. For BI workloads, the goal is predictable latency at controlled cost.
Join strategy is frequently tested. Large-to-large joins without filters are expensive; push down filters early, select only needed columns, and join on well-distributed keys. If one table is small enough, consider using it as a broadcast-style join input by structuring the query so the small dimension is joined after reducing the fact table. When denormalization is acceptable for BI, precompute wide tables in the serving layer to avoid repeated joins in dashboards.
Exam Tip: If the prompt mentions “scans too much data,” pick partitioning/clustering/materialized views over “buy more slots.” Slots help concurrency and runtime, but they don’t fix a query that scans unnecessary bytes.
Common trap: Assuming indexing works like OLTP databases. BigQuery doesn’t use traditional indexes; the exam expects you to rely on partitioning/clustering, table design, and query rewrite rather than “add an index.” Another trap is using SELECT * in production BI queries—this increases scanned bytes and breaks performance tuning.
The exam expects you to know when BigQuery ML (BQML) is a good fit: fast iteration on structured data already in BigQuery, simpler operational overhead, and SQL-native training/prediction. Operationalizing ML here means more than creating a model; it includes reproducible feature logic, evaluation, and integration into pipelines (batch scoring, scheduled retraining, or triggering downstream actions).
BQML workflows typically include: CREATE MODEL for training, ML.EVALUATE for metrics, ML.PREDICT for inference, and ML.EXPLAIN_PREDICT for interpretability. You should also recognize feature considerations: leakage (using future information), correct time-based splits, handling categorical variables, and ensuring training/serving parity by defining features in views or stable SQL transformations.
Exam Tip: If the scenario highlights “data already in BigQuery,” “SQL-skilled team,” and “need quick baseline,” BQML is often the best answer. If it emphasizes custom training code, complex feature engineering, or online low-latency inference, look toward Vertex AI instead (even if not explicitly named, the constraints guide you).
Common trap: Ignoring leakage and evaluation design. Many wrong answers propose random splits for time-series-like data (e.g., churn, demand) where you must split by time to simulate real prediction conditions. Another trap is retraining without monitoring drift—operationalization includes schedules and quality gates, not one-off training.
Domain 5 scenarios often ask you to pick an orchestration tool and scheduling pattern that reduces manual operations and improves reliability. Cloud Composer (managed Apache Airflow) is best when you need rich DAG dependencies, retries, backfills, and many operators across GCP services. Workflows is best for lightweight service orchestration and API-to-API coordination with clear state handling, especially when you don’t need a full Airflow environment.
Scheduling patterns show up in subtle ways. Time-based schedules (cron) are straightforward for daily aggregates, but event-driven triggers (Pub/Sub, Cloud Storage notifications) are better for near-real-time or irregular arrivals. For streaming + batch hybrids, the exam may expect a micro-batch pattern where Dataflow writes to BigQuery continuously while a scheduled job builds serving-layer aggregates.
Exam Tip: When you see “complex dependencies,” “backfills,” “many steps across services,” default to Composer/Airflow. When you see “call a few services in sequence,” “API orchestration,” or “state machine,” Workflows is often the cleanest answer.
Common trap: Treating orchestration as transformation. Airflow should coordinate work, not be the compute engine. If an option embeds heavy transformations inside Airflow workers instead of using BigQuery/Dataflow/Spark, it’s typically not the best-practice choice.
The exam increasingly emphasizes operational maturity: you must detect failures quickly, measure pipeline health, and respond with minimal toil. On Google Cloud, observability usually means Cloud Logging for logs, Cloud Monitoring for metrics and alerting, and Error Reporting/Trace where applicable. For data systems, the key is defining SLIs (what you measure) and SLOs (the targets) aligned to the business, not just infrastructure uptime.
Common SLIs for pipelines include: freshness/latency (data available by X time), completeness (row counts within expected bounds), correctness (quality rule pass rate), and cost (bytes processed per day). For streaming, backlog/lag and end-to-end event-time latency are crucial. For BigQuery, monitor job failures, slot utilization (if using reservations), and query bytes processed to catch runaway costs.
Exam Tip: If a question asks how to reduce MTTR (mean time to recovery), choose answers that add targeted alerts + runbooks and expose pipeline state (freshness, completeness), not just “enable logging.” Logging alone doesn’t guarantee you’ll notice the problem.
Common trap: Alerting on every transient error. Retries are normal in distributed systems; alert when the retry budget is exhausted or when business-facing SLIs are violated.
Automation is the “production readiness” multiplier tested in Domain 5. Expect scenarios where manual deployments cause outages, permissions drift, or inconsistent environments. The exam-preferred posture is Infrastructure as Code (IaC) for repeatable environments (projects, IAM, networks, datasets), CI/CD for pipeline code (Dataflow templates, Composer DAGs, SQL transformations), and policy-based governance for security and compliance.
CI/CD principles for data: unit test transformation logic where possible, validate schemas and contracts, run integration tests on representative partitions, and promote artifacts through environments. For BigQuery, this can mean version-controlled SQL, automated deployment of views/routines, and checks that prevent breaking changes to downstream tables. Use service accounts with least privilege and separate roles by environment.
Cost control is heavily testable because it ties directly to BigQuery usage. Controls include budgets and alerts, per-project/dataset organization, and using reservations/editions strategically. On the query side, enforce partition filters, avoid SELECT *, use materialized views for repeated aggregates, and consider table expiration for transient datasets. Governance includes Data Catalog/Dataplex-style metadata and classification, policy tags for column-level security, and authorized views to share curated datasets without exposing raw PII.
Exam Tip: If the scenario mentions audits, PII, or “only analysts should see masked fields,” pick column-level security with policy tags and authorized views rather than copying/redacting data into new tables.
Common trap: Solving reliability with manual runbooks alone. The exam prefers automated rollback, automated validation gates (quality checks), and policy guardrails (IAM/IaC) to prevent incidents rather than merely responding to them.
1. A retail company uses BigQuery as the source for Looker dashboards. Analysts frequently run interactive queries over a 5 TB fact table joined to multiple dimensions, filtered by date. Dashboard latency has increased and on-demand query costs are rising. The company wants to improve BI performance without duplicating data across many derived tables or introducing manual refresh steps. What should you recommend?
2. A data team is building a revenue KPI consumed by multiple business units. They have repeated incidents where teams compute the KPI differently in ad-hoc SQL, causing inconsistent numbers across reports. They want a governed, reusable approach in BigQuery that keeps a single definition while still enabling fast analysis. What is the best solution?
3. A company wants to operationalize a churn model using BigQuery ML. Data scientists need to train weekly, evaluate metrics, and write predictions to a table used by downstream applications. The solution must be automated, auditable, and easy to monitor. Which approach best meets these requirements?
4. A nightly data pipeline loads data into BigQuery and then runs transformations that must complete before 6 a.m. The pipeline occasionally fails due to upstream delays, and engineers currently discover failures from business users. You need to improve reliability with minimal operational overhead and ensure actionable alerts. What should you implement?
5. A media company has a BigQuery table of clickstream events used for both batch analytics and near-real-time reporting. Analysts frequently query 'last 7 days' and group by user_id and campaign_id. Query costs are high and performance varies significantly. You want to reduce cost and improve consistent performance while keeping the data in one table. What is the best BigQuery table design change?
This chapter converts everything you’ve studied into exam-day performance. The Google Professional Data Engineer (PDE) exam rewards candidates who can choose the best design under constraints (latency, cost, governance, reliability), not those who can recite product definitions. Your final preparation should therefore look like the exam: mixed domains, ambiguous tradeoffs, and case-style prompts that require you to infer unstated requirements from context.
You will run two timed mock blocks (Part 1 and Part 2), then do a structured Weak Spot Analysis that maps errors to official objectives and to recurring reasoning mistakes. Finally, you’ll use a short “cram sheet” of decision rules and service limits to reduce cognitive load, and you’ll finish with an Exam Day Checklist focused on pacing and elimination techniques.
Exam Tip: Your goal is not a high mock score once—it’s a stable process: timeboxing, consistent reasoning, and fast recovery from uncertainty. The best candidates know when to “park” a question and protect time for easier points.
Practice note for Mock Exam Part 1: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Mock Exam Part 2: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Weak Spot Analysis: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Exam Day Checklist: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Mock Exam Part 1: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Mock Exam Part 2: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Weak Spot Analysis: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Exam Day Checklist: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Mock Exam Part 1: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Mock Exam Part 2: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Practice note for Weak Spot Analysis: document your objective, define a measurable success check, and run a small experiment before scaling. Capture what changed, why it changed, and what you would test next. This discipline improves reliability and makes your learning transferable to future projects.
Run your mock exam in a realistic environment: single sitting, no tabs, no notes, and a hard stop when time expires. The PDE exam is designed to test decision-making under time pressure across ingestion, storage, processing, governance, and operations. Treat your mock as an operational drill: you are practicing the mechanics of reading, prioritizing, eliminating, and committing.
Timeboxing strategy should be explicit. First pass: answer everything you can confidently within a short “budget” per item, and mark anything that requires deeper tradeoff analysis or rereading. Second pass: revisit marked items with remaining time. Third pass (if any): sanity-check only the highest-impact flags, not the entire exam.
Exam Tip: If two options both “work,” the exam usually prefers the one with lower operational overhead, clearer responsibility boundaries, and native integration (for example, managed pipelines, IAM-first governance, and serverless analytics when appropriate).
Common trap: spending too long proving a complex architecture when the question is asking for a single control-plane feature (for example, partitioning strategy, IAM condition, Dataflow windowing choice, or BigQuery reservation/cost control). Train yourself to identify the question’s real axis: latency? schema evolution? governance? cost predictability? reliability?
Mock Exam Part 1 should feel like a consulting engagement: multi-paragraph scenarios with business goals, existing stack details, and compliance requirements. This is where the exam tests your ability to design end-to-end systems aligned to outcomes: ingest → process → store → serve → govern. Expect prompts that require you to select storage and schema strategy (BigQuery vs Cloud Storage vs Bigtable vs Spanner), choose batch vs streaming patterns (Dataflow, Dataproc, Pub/Sub), and enforce governance (IAM, VPC-SC, DLP, CMEK).
In case-based items, extract requirements into a quick mental list: (1) freshness/latency target, (2) data volume and growth, (3) query patterns (point lookups vs scans vs aggregates), (4) compliance (PII, residency, encryption), (5) operations model (SRE maturity, on-call tolerance), and (6) cost constraints (reserved capacity, storage tiering, egress).
Exam Tip: When the scenario emphasizes analytics and SQL with massive scans, default to BigQuery with partitioning/clustering and controlled access patterns. When it emphasizes low-latency key-based access at scale, think Bigtable or Spanner; if it emphasizes object retention and cheap storage, think Cloud Storage with lifecycle policies.
Common traps in Part 1:
How to identify the best answer: favor managed services, minimize moving parts, and ensure the design explicitly addresses constraints mentioned in the prompt. If a choice solves performance but violates compliance or increases operational risk, it is rarely the best.
Mock Exam Part 2 shifts toward troubleshooting and reliability: pipeline lag, backlogs, data quality regressions, access failures, and unexpected cost spikes. The exam expects you to diagnose the most probable cause and choose the remediation with the highest impact and lowest risk. These items often hide the answer in operational signals: watermark delay, Pub/Sub subscription backlog, Dataflow worker autoscaling limits, BigQuery slot contention, or IAM denied logs.
Use a consistent debug flow aligned to objectives: observe → isolate → remediate → prevent recurrence. “Observe” means identifying the right monitoring surface (Cloud Monitoring metrics, Dataflow job graphs, BigQuery job timeline, Cloud Logging). “Isolate” means differentiating ingestion issues (Pub/Sub, transfer jobs), processing issues (Dataflow windowing, shuffle, hot keys), storage issues (partition pruning, streaming inserts), and governance issues (IAM, KMS, VPC-SC).
Exam Tip: In troubleshooting questions, the best answer is often the one that changes the least while restoring SLOs (for example, tuning Dataflow autoscaling and worker types, adding BigQuery partition filters, or changing write patterns) rather than “migrate everything” proposals.
Common traps:
Train yourself to match symptoms to likely causes. Backlog + normal publish rate suggests subscriber/processor bottleneck. High BigQuery latency + many concurrent jobs suggests slot contention and the need for reservations or workload management. Access denied + perimeter rules suggests VPC-SC or IAM conditions misconfiguration.
Your Weak Spot Analysis is where your score actually improves. Do not only mark answers right/wrong. For every missed or guessed item, map it to (a) the exam objective, (b) the concept category (architecture, ingestion, storage, processing, governance, operations), and (c) the error pattern you exhibited.
Use a repeatable review template: What requirement did I miss? What constraint did I overweight? What GCP service feature was decisive? What “best answer” rule did the exam want? The goal is to create a small set of corrections that apply to many future questions.
Exam Tip: Track “near misses” (correct answer for wrong reasons). These are dangerous because they don’t feel like weaknesses, yet they fail under slightly different constraints on the real exam.
Typical error patterns for PDE candidates:
After categorizing errors, pick the top 2–3 patterns and create a targeted drill: reread a service’s decision boundary, write a one-paragraph rule, and practice applying it. Your final week should be “narrow and deep,” not “wide and shallow.”
This cram sheet is not a catalog; it’s a set of decision rules you can recall under pressure. Focus on “when to choose what,” plus the operational and governance features the exam repeatedly tests.
Exam Tip: If an option adds a bespoke cluster, custom sharding, or manual scaling, ask whether the prompt actually requires it. The PDE exam frequently rewards “managed-first” unless the scenario explicitly demands specialized control.
Common decision-rule trap: choosing a technically impressive architecture that ignores what the business asked for (for example, real-time dashboards) or what compliance forbids (for example, unrestricted dataset sharing). Keep re-centering on requirements.
On exam day, treat the test like an incident response exercise: calm, structured, and time-aware. Confirm your environment early (ID, testing location rules or online proctor requirements, stable network if remote, quiet room). Enter the exam with a pacing plan and a mental checklist of elimination techniques.
Start with rapid requirement extraction: underline (mentally) the nouns that define constraints—freshness, volume, compliance, regions, SLOs, and operations. Then eliminate answers that violate explicit constraints first (for example, missing encryption requirements, failing residency needs, or proposing public endpoints when private networking is implied). Next, eliminate answers that create unnecessary operational burden. Finally, pick between remaining candidates using “best answer” heuristics: managed services, clear scaling model, and native governance.
Exam Tip: When stuck between two plausible answers, ask: “Which one is more directly aligned to the stated success metric?” If the metric is latency, choose the lowest-latency architecture; if it is cost predictability, choose reservations and partition discipline; if it is security, choose perimeter controls and least privilege.
Finish with a final pass only on marked items and only if time permits. Do not second-guess correct answers without a concrete reason tied to requirements. Your objective is consistent, defensible decision-making—the same skill the role demands in production.
1. You are building a streaming analytics pipeline for IoT telemetry. Requirements: near-real-time dashboards (p95 < 5 seconds), ability to replay data for backfills, and minimal operational overhead. Which design best meets these requirements on Google Cloud?
2. A healthcare company stores PHI in BigQuery and must ensure (1) no analyst can export raw PHI, (2) analysts can query approved, de-identified views, and (3) governance is enforceable at scale. What is the best approach?
3. Your team is running a timed mock exam and notices you consistently lose time on multi-paragraph case questions, often leaving easier questions unanswered. Which strategy best aligns with exam-day pacing and elimination techniques for the PDE exam?
4. After completing a mock exam, you find most incorrect answers came from choosing solutions that were performant but violated governance requirements (e.g., broad IAM, unencrypted exports, missing retention controls). What is the best next step in a Weak Spot Analysis process?
5. On exam day, you encounter a scenario asking you to design a data platform with strict SLAs and disaster recovery requirements across regions. You are unsure which option best balances cost and reliability. What is the best action to maximize your score?