Free Snowflake DEA-C02 Practice Test & Real Exam Questions
A data engineer is tasked with processing a large dataset of customer orders using Snowpark Python. The dataset contains a column stored as a string in 'YYYY-MM-DD HH:MI:SS' format. They need to create a new DataFrame with only the orders placed in the month of January 2023. Which of the following code snippets achieves this most efficiently, considering potential data volume and query performance?
Correct Answer: A
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You are tasked with creating a development environment from a production database in Snowflake. The production database is named 'PROD DB' and contains several schemas, including 'CUSTOMER DATA' and 'PRODUCT DATA'. You want to create a clone of the 'PROD DB' database named 'DEV DB', but you only need the 'CUSTOMER DATA' schema for development purposes and all the data should be masked with a custom UDF 'MASK EMAIL' for 'email' column in 'CUSTOMER' table. The 'email' column is VARCHAR. Which of the following sequences of SOL statements would achieve this in Snowflake? Note: UDF MASK EMAIL already exists in the account.
Correct Answer: A
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
A data team is using Snowflake to analyze sensor data from thousands of IoT devices. The data is ingested into a table named 'SENSOR READINGS' which contains columns like 'DEVICE ID', 'TIMESTAMP', 'TEMPERATURE', 'PRESSURE', and 'LOCATION' (a GEOGRAPHY object). Analysts frequently run queries that calculate the average temperature and pressure for devices within a specific geographic area over a given time period. These queries are slow, especially when querying data from multiple months. Which of the following approaches, when combined, will BEST optimize the performance of these queries using the query acceleration service?
Correct Answer: B
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You are implementing a data pipeline in Snowpark that reads data from an external stage (e.g., AWS S3) and performs complex transformations, including joins with large Snowflake tables. You notice that the pipeline's performance is significantly slower than expected, despite having sufficient warehouse resources. Which of the following actions would MOST likely improve the performance of the Snowpark data pipeline?
Correct Answer: C,D,E
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You need to implement both a row access policy and a dynamic data masking policy on the 'EMPLOYEE table in Snowflake. The requirements are as follows: 1. Employees should only be able to see their own record in the 'EMPLOYEE table. 2. The 'SALARY' column should be masked for all employees except those with the 'HR ADMIN' role. Unmasked values are required for compliance reasons, they need to be available for 'HR ADMIN' role. Given the following table structure: CREATE TABLE EMPLOYEE ( EMPLOYEE ID INT, EMPLOYEE NAME STRING, SALARY NUMBER, EMAIL STRING ) ; Which of the following sets of steps correctly implement the row access policy and dynamic data masking policy?


Correct Answer: C
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You need to load data from a stream of CSV files into a Snowflake table. The CSV files are delivered to an AWS S3 bucket and contain header rows. The files occasionally include records where a text field contains a delimiter character (comma) within the text itself, but these fields are properly enclosed within double quotes. You want to create a file format object that correctly handles the data, including quoted delimiters, and skips the header row. Which of the following file format options are required to achieve this? (Choose two)
Correct Answer: A,E
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You are designing a data ingestion process that involves loading data from an external stage. The data is partitioned into multiple files based on date. The stage is configured to point to the root directory of the partitioned dat a. You want to efficiently load only the data for a specific date (e.g., '2023-01-15') using the 'COPY' command. Assume your stage name is 'my _ stage' , your table is 'my_table', your date column is named 'event_date', and the files in the stage are named in the format 'data YYYY-MM-DD.csv'. Which of the following options allows you to selectively load the data for the specific date? (Select ALL that apply)


Correct Answer: B,C,E
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You have configured a Kafka Connector to load JSON data into a Snowflake table named 'ORDERS. The JSON data contains nested structures. However, Snowflake is only receiving the top- level fields, and the nested fields are being ignored. Which configuration option within the Kafka Connector needs to be adjusted to correctly flatten and load the nested JSON data into Snowflake?
Correct Answer: E
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You have a Snowflake table named 'ORDERS clustered on 'ORDER DATE. After a significant data load, you want to evaluate the effectiveness of the clustering. Which of the following SQL queries, using Snowflake system functions, will provide insights into the clustering depth and overlap of micro-partitions in the 'ORDERS' table, specifically helping you identify whether re-clustering is necessary? Assume that the table


Correct Answer: E
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You are tasked with optimizing a continuous data pipeline that loads data from an external stage into a Snowflake table using streams.
The pipeline is experiencing significant latency during peak hours. The stream is defined on a very large table with frequent updates and deletes. Which of the following strategies would be MOST effective in reducing the latency of the data pipeline, considering stream performance and cost implications?
The pipeline is experiencing significant latency during peak hours. The stream is defined on a very large table with frequent updates and deletes. Which of the following strategies would be MOST effective in reducing the latency of the data pipeline, considering stream performance and cost implications?
Correct Answer: E
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
A financial services company is using Snowflake Streams on a table 'TRANSACTIONS' to capture changes for auditing purposes. The 'TRANSACTIONS' table contains sensitive data, and the auditing team requires the stream to only capture changes to specific columns: 'ACCOUNT ID', 'TRANSACTION DATE', and 'TRANSACTION AMOUNT'. Which of the following approaches is the MOST efficient and secure way to achieve this requirement, ensuring minimal performance impact and data exposure?
Correct Answer: A
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
A data engineer needs to optimize the performance of a series of complex transformations performed using Snowflake stored procedures. These procedures involve multiple table joins, aggregations, and data filtering operations. The current execution time is unacceptably long. Which of the following optimization strategies are most likely to provide the greatest performance improvements, considering both code-level optimizations and Snowflake's architecture? Select all that apply.
Correct Answer: A,C,E
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You're working on a data transformation pipeline in Snowflake. You need to create a SQL UDF called that accepts the following parameters: 'price' (NUMBER) (NUMBER) (BOOLEAN) The function should calculate the final price after applying the discount. If the customer , an additional 5% discount should be applied on top of the initial discount. Choose all the valid SQL UDF definitions that accurately implement the requirements. Only one Discount Percentage needs to apply to the base price.


Correct Answer: A,B,E
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
You have a directory table 'my_directory_table' pointing to a stage containing CSV files with headers. You need to query the directory table to find all files modified in the last 24 hours and load those CSV files using COPY INTO into a target table Assume the target table exists and has appropriate schema'. Which of the following SQL statements, or set of statements, will accomplish this efficiently? Note: Consider efficient file loading.
Correct Answer: C
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
