Free Snowflake DEA-C02 Practice Test & Real Exam Questions
A data engineer is facing performance issues with a complex analytical query in Snowflake. The query joins several large tables and uses multiple window functions. The query profile indicates that a significant amount of time is spent in the 'Remote Spill' stage. This means the data from one of the query stages is spilling to the remote disk. What are the possible root causes for 'Remote Spill' and what steps can be taken to mitigate this issue? Select two options.
Correct Answer: B,E
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You are designing a data pipeline that requires applying a complex scoring algorithm to customer data in Snowflake. This algorithm involves multiple steps, including feature engineering, model loading, and prediction. You want to encapsulate this logic within a reusable component and apply it to incoming data streams efficiently. Which of the following approaches is most suitable and scalable for implementing this scoring logic as a UDF/UDTF, considering real-time data processing and low latency requirements?
Correct Answer: C
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You are tasked with creating a Snowpark Python UDF that calculates the exponential moving average (EMA) of a time series dataset stored in a Snowflake table named 'SALES DATA'. The table has columns 'TIMESTAMP' (TIMESTAMP_NTZ) and 'SALES' (NUMBER). The EMA should be calculated for each product, identified by the 'PRODUCT ID' column. You want to optimize the calculation by using a Pandas DataFrame within the UDF and leveraging vectorized operations. Which of the following code snippets would be the MOST efficient and correct way to achieve this? Assume 'alpha' is a predefined float variable representing the smoothing factor.


Correct Answer: B
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
Consider a scenario where you have a large dataset of sensor readings stored in a Snowflake table called 'SENSOR DATA'. You need to build an external function to perform complex calculations on these readings using a custom Python library hosted on AWS Lambda'. The calculation requires significant computational resources, and you want to optimize the data transfer between Snowflake and the Lambda function. The following SQL is provided: CREATE OR REPLACE EXTERNAL FUNCTION ARRAY) RETURNS ARRAY VOLATILE MAX BATCH ROWS = 2000 RETURNS NULL ON NULL INPUT API INTEGRATION = aws_lambda_integration AS 'arn:aws:lambda:us-east-1:123456789012:function:sensorProcessor'; Which of the following options would further optimize the performance and reduce data transfer costs, assuming the underlying Lambda function is correctly configured and functional?
Correct Answer: B,C,D
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
A Snowflake table 'PRODUCT REVIEWS' is being ingested into from an external system. You have a stream 'PRODUCT REVIEWS STREAM' defined on this table to capture changes. Due to a bug in the ingestion process, incorrect data was loaded for a specific period. You need to correct the data'. Which of the following SQL statements, when executed against the 'PRODUCT REVIEWS STREAM' , will return the number of rows that were inserted, updated, and deleted during that period?
Correct Answer: A
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You have a Snowflake table, 'CUSTOMER ORDERS', with columns like 'CUSTOMER ID', 'ORDER DATE', 'ORDER AMOUNT', and 'REGION'. A Bl dashboard relies on a query that aggregates data from this table, but the query performance is consistently slow. The query frequently filters by 'ORDER DATE and groups by 'REGION'. Based on the following 'EXPLAIN' output, which combination of techniques should be considered to improve the performance the most?


Correct Answer: A,D
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You have a table named 'ORDERS' with a column 'ORDER DETAILS' that contains JSON data'. You want to extract a specific nested value ('customer id') from this JSON data using a SQL UDE The JSON structure varies, and sometimes the 'customer id' field might be missing. You need to create a UDF that handles missing fields gracefully and returns NULL if 'customer id' is not found. Also, You are looking for a performant solution that is highly scalable. Which of the following SQL UDF definitions is most appropriate?
Correct Answer: E
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You have created a JavaScript UDF named 'calculate discount' in Snowflake that takes two arguments: 'product_price' (NUMBER) and 'discount_percentage' (NUMBER). The UDF calculates the discounted price using the formula: 'product_price (1 - discount_percentage / 100)'. However, when you call the UDF with certain input values, you are encountering unexpected results, specifically with very large or very small numbers due to JavaScript's number precision limitations. Which of the following strategies can you implement to mitigate this issue and ensure accurate calculations within your JavaScript UDF?
Correct Answer: D
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You are designing a data loading process for a high-volume streaming data source. The data arrives as Avro files in an AWS S3 bucket. You need to load this data into a Snowflake table with minimal latency and operational overhead. Which of the following combinations of Snowflake features and configurations would be MOST suitable for this scenario? (Select TWO)
Correct Answer: D,E
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
A financial institution needs to mask sensitive customer data (PII) in a 'CUSTOMER' table. The table contains columns like 'CUSTOMER ID', 'FIRST NAME', 'LAST NAME', 'CREDIT CARD, and 'ADDRESS'. The data should be masked differently for different roles: 'ANALYST' role should see obfuscated values for names and addresses, while the 'SUPPORT' role should see the last four digits of the credit card and a hashed version of the address. The "CUSTOMER ID' should never be masked. Assume a central masking policy already exists called 'PII MASKING POLICY. Which of the following statements is the MOST efficient and secure way to achieve this?
Correct Answer: C
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You are building a data pipeline in Snowflake that uses an external function to perform sentiment analysis on customer reviews stored in a table named 'CUSTOMER REVIEWS'. The external function 'sentiment_analyzer' is hosted on AWS Lambda and requires an API key for authentication. You want to ensure that the API key is securely passed to the Lambda function and prevent unauthorized access. Which of the following approaches represents the MOST secure and recommended method to manage the API key?
Correct Answer: C
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
A data engineering team is responsible for an ELT pipeline that loads data into Snowflake. The pipeline has two distinct stages: a high- volume, low-complexity transformation stage using SQL on raw data, and a low-volume, high-complexity transformation stage using Python UDFs that leverages an external service for data enrichment. The team is experiencing significant queueing during peak hours, particularly impacting the high-volume stage. You need to optimize warehouse configuration to minimize queueing. Which combination of actions would be MOST effective?
Correct Answer: A
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You're managing a Snowflake data warehouse and need to create a development environment for testing a complex stored procedure that updates a critical table, 'SALES DATA'. The procedure is located in the 'PRODUCTION' database and you want to ensure minimal impact to the production environment during development. You decide to use cloning and time travel. Which of the following strategies is the MOST efficient and safest approach to achieve this, minimizing downtime and resource consumption in production?
Correct Answer: C
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You have a Snowflake table called 'ORDERS' with a column named 'ORDER DETAILS that contains JSON data'. You need to create a UDF to extract a specific value from the JSON and return it. The JSON structure is consistent: '{'customer': {'name': 'John Doe', 'email': '[email protected]'}, 'items': [{'item id': 1, 'quantity': 2}, {'item id': 2, 'quantity': You want to extract the customer's email address. Which of the following UDF definitions is the most efficient and correct?


Correct Answer: E
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
