Free Snowflake DEA-C02 Practice Test & Real Exam Questions

  • Exam Code/Number: DEA-C02
  • Exam Name/Title: SnowPro Advanced: Data Engineer (DEA-C02)
  • Certification Provider: Snowflake
  • Corresponding Certification: SnowPro Advanced
  • Exam Questions: 354
  • Updated On: Jul 03, 2026
Consider a scenario where you need to transform data in a Snowflake table using a complex custom transformation logic best implemented in Java'. You decide to use a Snowpark Java UDF. You've packaged your Java code into a JAR file and uploaded it to an internal stage named Which of the following steps are necessary and correctly ordered to deploy and use this Java UDF within Snowflake?
Correct Answer: C Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
A financial services company is implementing Snowflake. They have a table 'CUSTOMER DATA' containing sensitive information like 'CREDIT CARD NUMBER, 'SSN', and 'ADDRESS'. They need to ensure that: 1) Analysts can only see the last four digits of the 'CREDIT CARD NUMBER. 2) Data scientists require full access to the 'ADDRESS' but should not see the 'SSN'. 3) A dedicated compliance role should be able to view all data in its original format for auditing purposes. Which of the following is the MOST efficient and secure approach to implement this using Snowflake's data masking and RBAC?
Correct Answer: A Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You have implemented external tokenization for a sensitive data column in Snowflake using a UDF that calls an external API. After some time, you discover that the external tokenization service is experiencing intermittent outages, causing queries using the tokenized column to fail. What is the BEST approach to mitigate this issue and maintain data availability while minimizing the risk of exposing the raw data?
Correct Answer: C Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You are tasked with sharing a subset of a customer table (CUSTOMER DATA') residing in your organization's Snowflake account with a partner organization. You need to mask personally identifiable information (PII) while providing near real-time updates. You decide to use a secure view. Which of the following SQL statements is the MOST efficient and secure way to accomplish this, assuming the partner only needs 'customer id', 'masked_email', 'city', and 'state'? The email should be masked using SHA256.
Correct Answer: E Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
A data engineering team is building a real-time dashboard in Snowflake to monitor website traffic. The dashboard relies on a complex query that joins several large tables. The query execution time is consistently exceeding the acceptable threshold, impacting dashboard responsiveness. Historical data is stored in a separate table and rarely changes. You suspect caching is not being utilized effectively. Which of the following actions would BEST improve the performance of this dashboard and leverage Snowflake's caching features?
Correct Answer: C Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
A healthcare provider stores patient data in Snowflake, including 'PATIENT ID', 'NAME, 'MEDICAL HISTORY , and 'INSURANCE ID. They need to comply with HIPAA regulations. As a data engineer, you need to ensure that PHI (Protected Health Information) is masked appropriately based on user roles. Which of the following steps are NECESSARY to achieve this using Snowflake's data masking features and RBAC? (Select all that apply)
Correct Answer: B,C,D Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
A data engineer notices that a daily ETL job loading data into a Snowflake table 'TRANSACTIONS' is consistently taking longer than expected. The table is append-only and partitioned by 'TRANSACTION DATE. The engineer observes high 'Remote Spill' during the load process and suspect that micro-partition pruning isn't working effectively. Which of the following approaches would BEST address the performance issue, assuming you have already considered increasing warehouse size?
Correct Answer: B,C Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You have a Snowflake table 'raw_data' with columns 'id', 'timestamp', and 'payload'. A stream is defined on this table. A data pipeline reads changes from the stream and applies transformations before loading the data into a target table. However, the pipeline needs to handle cases where updates to the same 'id' occur multiple times within a short period, and only the latest version of the 'payload' should be processed. How can you achieve this idempotent processing of stream data to ensure only the latest payload is applied to the target table, avoiding duplicates and inconsistencies, using Snowflake streams?
Correct Answer: C Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You have a table named 'EMPLOYEES with a retention period of 1 day. You accidentally deleted several important rows from this table, but you need to recover the data'. You know the deletion occurred 25 hours ago. What actions should be taken to attempt to recover the deleted data, and what outcome can you expect? Assume you are working in an Enterprise edition of Snowflake account.
Correct Answer: E Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
A data pipeline ingests clickstream data from various sources into a raw Snowflake table CRAW CLICKS). A transformation job then processes this data and loads it into a more structured 'CLICK EVENTS table, performing filtering, cleaning, and data enrichment. The data engineering team notices significant performance bottlenecks during this transformation process, leading to data freshness issues.
The team wants to optimize this process, considering the following:
Correct Answer: B,D,E Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
A financial institution is using Snowflake to store transaction data for millions of customers. The data is stored in a table named 'TRANSACTIONS with columns such as 'TRANSACTION ID, 'CUSTOMER ID', 'TRANSACTION DATE, 'TRANSACTION_AMOUNT, and 'MERCHANT CATEGORY'. Analysts are running complex analytical queries that often involve filtering transactions by 'TRANSACTION_DATE, 'MERCHANT CATEGORY' , and 'TRANSACTION_AMOUNT ranges. These queries are experiencing performance bottlenecks. The data team wants to leverage query acceleration service to improve performance without significantly altering the existing query patterns. Which of the following actions or combination of actions would be MOST beneficial, considering the constraints and the nature of the queries? (Select TWO)
Correct Answer: D,E Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You are configuring a Snowflake Data Clean Room for two healthcare providers, 'ProviderA' and 'ProviderB', to analyze patient overlap without revealing Personally Identifiable Information (PII). Both providers have patient data in their respective Snowflake accounts, including a 'PATIENT ID' column that uniquely identifies each patient. You need to create a secure join that allows the providers to determine the number of shared patients while protecting the raw 'PATIENT ID' values. Which of the following approaches is the most secure and efficient way to achieve this using Snowflake features? Select TWO options.
Correct Answer: C,E Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
A large e-commerce company stores clickstream data in an AWS S3 bucket. The data is partitioned by date and consists of Parquet files. They need to analyze this data in Snowflake without physically moving it into Snowflake's internal storage. However, the data frequently changes, and they need to ensure queries reflect the latest updates to the files without significant latency. Which of the following approaches would be MOST suitable, considering cost, performance, and data freshness?
Correct Answer: C Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You have a Snowflake Stream named 'PRODUCT CHANGES' created on a table 'PRODUCTS'. A downstream task attempts to consume records from the stream, but occasionally fails with a 'Table PRODUCTS has been altered' error. The 'PRODUCTS' table undergoes DDL changes (e.g., adding/dropping columns) infrequently, but these changes are necessary for evolving business requirements. How can you design a more resilient data pipeline that minimizes disruptions caused by DDL changes to the 'PRODUCTS' table while still leveraging the 'PRODUCT CHANGES' stream?
Correct Answer: E Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).