Free Snowflake DEA-C02 Practice Test & Real Exam Questions, Page 1

Question #1

A data engineer is tasked with processing a large dataset of customer orders using Snowpark Python. The dataset contains a column stored as a string in 'YYYY-MM-DD HH:MI:SS' format. They need to create a new DataFrame with only the orders placed in the month of January 2023. Which of the following code snippets achieves this most efficiently, considering potential data volume and query performance?

A.

B.

C.

D.

E.

Discussion 0

Correct Answer: A Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #2

You are tasked with creating a development environment from a production database in Snowflake. The production database is named 'PROD DB' and contains several schemas, including 'CUSTOMER DATA' and 'PRODUCT DATA'. You want to create a clone of the 'PROD DB' database named 'DEV DB', but you only need the 'CUSTOMER DATA' schema for development purposes and all the data should be masked with a custom UDF 'MASK EMAIL' for 'email' column in 'CUSTOMER' table. The 'email' column is VARCHAR. Which of the following sequences of SOL statements would achieve this in Snowflake? Note: UDF MASK EMAIL already exists in the account.

A.

B.

C.

D.

E.

Discussion 0

Correct Answer: A Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #3

A data team is using Snowflake to analyze sensor data from thousands of IoT devices. The data is ingested into a table named 'SENSOR READINGS' which contains columns like 'DEVICE ID', 'TIMESTAMP', 'TEMPERATURE', 'PRESSURE', and 'LOCATION' (a GEOGRAPHY object). Analysts frequently run queries that calculate the average temperature and pressure for devices within a specific geographic area over a given time period. These queries are slow, especially when querying data from multiple months. Which of the following approaches, when combined, will BEST optimize the performance of these queries using the query acceleration service?

A. Enable Automatic Clustering on 'DEVICE_ID' , then enable query acceleration on the virtual warehouse.

B. Cluster the table by 'LOCATION' and 'TIMESTAMP , and enable search optimization on the 'LOCATION' column, and then enable query acceleration.

C. Create a materialized view that pre-calculates the average temperature and pressure by device and location. Then enable query acceleration on the virtual warehouse.

D. Partition the ' SENSOR_READINGS table by 'TIMESTAMP (e.g., daily partitions). Enable search optimization on the 'LOCATION' column and enable query acceleration.

E. Enable search optimization on 'TEMPERATURE and 'PRESSURE columns and enable query acceleration.

Discussion 0

Correct Answer: B Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #4

You are implementing a data pipeline in Snowpark that reads data from an external stage (e.g., AWS S3) and performs complex transformations, including joins with large Snowflake tables. You notice that the pipeline's performance is significantly slower than expected, despite having sufficient warehouse resources. Which of the following actions would MOST likely improve the performance of the Snowpark data pipeline?

A. Increase the warehouse size to the largest available option (e.g., X-Large or larger).

B. Reduce the number of partitions in the DataFrame representing the data from the external stage using 'df.repartition(l )'.

C. Ensure that the external stage is properly configured with appropriate data formats (e.g., Parquet) and partitioning schemes that align with the join keys.

D. Persist the DataFrame representing the data from the external stage using 'df.cache()' before performing the joins.

E. Optimize the SQL joins within the Snowpark DataFrame operations by using broadcast joins when appropriate and ensuring correct join key data types.

Discussion 0

Correct Answer: C,D,E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #5

You need to implement both a row access policy and a dynamic data masking policy on the 'EMPLOYEE table in Snowflake. The requirements are as follows: 1. Employees should only be able to see their own record in the 'EMPLOYEE table. 2. The 'SALARY' column should be masked for all employees except those with the 'HR ADMIN' role. Unmasked values are required for compliance reasons, they need to be available for 'HR ADMIN' role. Given the following table structure: CREATE TABLE EMPLOYEE ( EMPLOYEE ID INT, EMPLOYEE NAME STRING, SALARY NUMBER, EMAIL STRING ) ; Which of the following sets of steps correctly implement the row access policy and dynamic data masking policy?

A. Option C

B. Option E

C. Option B

D. Option D

E. Option A

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #6

You need to load data from a stream of CSV files into a Snowflake table. The CSV files are delivered to an AWS S3 bucket and contain header rows. The files occasionally include records where a text field contains a delimiter character (comma) within the text itself, but these fields are properly enclosed within double quotes. You want to create a file format object that correctly handles the data, including quoted delimiters, and skips the header row. Which of the following file format options are required to achieve this? (Choose two)

A. FIELD OPTIONALLY ENCLOSED BY =

B. FIELD DELIMITER = ','

C. ERROR ON COLUMN COUNT MISMATCH = FALSE

D. FILE_FORMAT = (TYPE = CSV)

E. SKIP HEADER = 1

Discussion 0

Correct Answer: A,E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #7

You are designing a data ingestion process that involves loading data from an external stage. The data is partitioned into multiple files based on date. The stage is configured to point to the root directory of the partitioned dat a. You want to efficiently load only the data for a specific date (e.g., '2023-01-15') using the 'COPY' command. Assume your stage name is 'my _ stage' , your table is 'my_table', your date column is named 'event_date', and the files in the stage are named in the format 'data YYYY-MM-DD.csv'. Which of the following options allows you to selectively load the data for the specific date? (Select ALL that apply)

A. Option C

B. Option E

C. Option B

D. Option D

E. Option A

Discussion 0

Correct Answer: B,C,E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #8

You have configured a Kafka Connector to load JSON data into a Snowflake table named 'ORDERS. The JSON data contains nested structures. However, Snowflake is only receiving the top- level fields, and the nested fields are being ignored. Which configuration option within the Kafka Connector needs to be adjusted to correctly flatten and load the nested JSON data into Snowflake?

A. Enable the 'snowflake.ingest.stage' property and set it to a Snowflake internal stage.

B. Configure the 'snowflake.data.field.name' property to specify the column in the Snowflake table where the entire JSON should be loaded as a VARIANT.

C. Use the 'transforms' configuration with the 'org.apache.kafka.connect.transforms.ExtractField$Value' transformation to extract specific fields.

D. Set the 'value.converter.schemas.enable' property to 'true'.

E. Apply the 'org.apache.kafka.connect.transforms.Flatten' transformation to the 'transforms' configuration.

Discussion 0

Correct Answer: E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #9

You have a Snowflake table named 'ORDERS clustered on 'ORDER DATE. After a significant data load, you want to evaluate the effectiveness of the clustering. Which of the following SQL queries, using Snowflake system functions, will provide insights into the clustering depth and overlap of micro-partitions in the 'ORDERS' table, specifically helping you identify whether re-clustering is necessary? Assume that the table

A.

B.

C.

D.

E.

Discussion 0

Correct Answer: E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #10

You are tasked with optimizing a continuous data pipeline that loads data from an external stage into a Snowflake table using streams.
The pipeline is experiencing significant latency during peak hours. The stream is defined on a very large table with frequent updates and deletes. Which of the following strategies would be MOST effective in reducing the latency of the data pipeline, considering stream performance and cost implications?

A. Increase the size of the virtual warehouse used for loading data. This will provide more compute resources for processing the stream.

B. Reduce the RETENTION TIME of the stream. This will limit the amount of historical data tracked and improve performance.

C. Create multiple streams on the same base table, each filtering for specific types of changes (e.g., INSERT, UPDATE, DELETE).

D. Implement a materialized view on top of the stream to pre-aggregate the data.

E. Implement a more aggressive pruning strategy on the base table to reduce the amount of data that the stream needs to track.

Discussion 0

Correct Answer: E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #11

A financial services company is using Snowflake Streams on a table 'TRANSACTIONS' to capture changes for auditing purposes. The 'TRANSACTIONS' table contains sensitive data, and the auditing team requires the stream to only capture changes to specific columns: 'ACCOUNT ID', 'TRANSACTION DATE', and 'TRANSACTION AMOUNT'. Which of the following approaches is the MOST efficient and secure way to achieve this requirement, ensuring minimal performance impact and data exposure?

A. Create a View that selects only the 'ACCOUNT ID, 'TRANSACTION DATE, and 'TRANSACTION AMOUNT columns and create a Stream on the View.

B. Create a task that clones the TRANSACTIONS table and a stream on that cloned table, limiting what changes are captured using a WHERE clause on the cloning command.

C. Create a Stream on the 'TRANSACTIONS' table. Periodically truncate stream and reload all data from TRANSACTION table by applying filter while loading.

D. Create a standard Stream on the 'TRANSACTIONS table and then filter the results in downstream processing to only include the required columns.

E. Create a Stream on the 'TRANSACTIONS' table and use a masking policy on the stream's output to redact the unnecessary columns.

Discussion 0

Correct Answer: A Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #12

A data engineer needs to optimize the performance of a series of complex transformations performed using Snowflake stored procedures. These procedures involve multiple table joins, aggregations, and data filtering operations. The current execution time is unacceptably long. Which of the following optimization strategies are most likely to provide the greatest performance improvements, considering both code-level optimizations and Snowflake's architecture? Select all that apply.

A. Implement data caching within the stored procedures using temporary tables to store intermediate results, reducing the need to repeatedly query the same data.

B. Use external functions written in languages like Python or Java for computationally intensive tasks that are not efficiently handled by SQL, taking advantage of Snowflake's support for external functions.

C. Ensure the underlying tables are appropriately clustered and consider using materialized views for frequently accessed aggregations or pre-computed results to improve query performance.

D. Rewrite the stored procedures to use procedural logic (e.g., loops and cursors) more extensively, as this allows for fine-grained control over the execution flow and can improve performance for certain operations.

E. Refactor the stored procedures to leverage set-based operations (e.g., using SQL statements with joins and aggregations) whenever possible, allowing Snowflake to parallelize the execution across multiple nodes.

Discussion 0

Correct Answer: A,C,E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #13

You're working on a data transformation pipeline in Snowflake. You need to create a SQL UDF called that accepts the following parameters: 'price' (NUMBER) (NUMBER) (BOOLEAN) The function should calculate the final price after applying the discount. If the customer , an additional 5% discount should be applied on top of the initial discount. Choose all the valid SQL UDF definitions that accurately implement the requirements. Only one Discount Percentage needs to apply to the base price.

A. Option C

B. Option E

C. Option B

D. Option D

E. Option A

Discussion 0

Correct Answer: A,B,E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #14

You have a directory table 'my_directory_table' pointing to a stage containing CSV files with headers. You need to query the directory table to find all files modified in the last 24 hours and load those CSV files using COPY INTO into a target table Assume the target table exists and has appropriate schema'. Which of the following SQL statements, or set of statements, will accomplish this efficiently? Note: Consider efficient file loading.

A.

B.

C.

D.

E.

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Free Snowflake DEA-C02 Practice Test & Real Exam Questions

Download Free Snowflake DEA-C02 Demo