Free Snowflake DEA-C02 Practice Test & Real Exam Questions, Page 4

Question #43

Consider a scenario where you need to transform data in a Snowflake table using a complex custom transformation logic best implemented in Java'. You decide to use a Snowpark Java UDF. You've packaged your Java code into a JAR file and uploaded it to an internal stage named Which of the following steps are necessary and correctly ordered to deploy and use this Java UDF within Snowflake?

A. Option C

B. Option E

C. Option B

D. Option D

E. Option A

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #44

A financial services company is implementing Snowflake. They have a table 'CUSTOMER DATA' containing sensitive information like 'CREDIT CARD NUMBER, 'SSN', and 'ADDRESS'. They need to ensure that: 1) Analysts can only see the last four digits of the 'CREDIT CARD NUMBER. 2) Data scientists require full access to the 'ADDRESS' but should not see the 'SSN'. 3) A dedicated compliance role should be able to view all data in its original format for auditing purposes. Which of the following is the MOST efficient and secure approach to implement this using Snowflake's data masking and RBAC?

A. Create masking policies on the 'SSN' , and 'ADDRESS' columns. Use conditional masking expressions based on the CURRENT ROLE() function to determine what data to show to each role (analysts, data scientists, compliance).

B. Create dynamic data masking policies on each sensitive column in the 'CUSTOMER_DATA table, associating these policies with specific roles using Snowflake's tag-based masking. Grant roles only the privileges needed to select the columns based on their requirements.

C. Replicate the CUSTOMER_DATA table three times, once for each user group (Analysts, Data Scientist and Compliance). Mask sensitive information by altering the data with the respective masking function.

D. Create separate views for analysts and data scientists, applying masking policies within the views, and grant access to these views based on their respective roles. Additionally, grant the compliance role direct access to the base table.

E. Use data encryption for the entire 'CUSTOMER_DATA table and provide decryption keys to specific roles based on their access requirements. Provide the compliance role with the master key.

Discussion 0

Correct Answer: A Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #45

You have implemented external tokenization for a sensitive data column in Snowflake using a UDF that calls an external API. After some time, you discover that the external tokenization service is experiencing intermittent outages, causing queries using the tokenized column to fail. What is the BEST approach to mitigate this issue and maintain data availability while minimizing the risk of exposing the raw data?

A. Modify the tokenization UDF to cache tokenization mappings locally within the Snowflake environment. When the external service is unavailable, the UDF can use the cached values.

B. Implement a masking policy on the column that returns the raw data when the tokenization UDF is unavailable, detected by catching exceptions within the policy logic.

C. Implement a try-catch block within the UDF. In the catch block, return a pre-defined, non-sensitive default value instead of attempting to call the external tokenization service. You can't return the raw value.

D. Implement a try-catch block within the UDF. In the catch block, return a pre-defined static token value (same value always) instead of attempting to call the external tokenization service. You can't return the raw value.

E. Replicate the tokenized table to another Snowflake region and switch to the replica during outages of the primary region. The tokenization service is guaranteed to be available in at least one region.

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #46

You are tasked with sharing a subset of a customer table (CUSTOMER DATA') residing in your organization's Snowflake account with a partner organization. You need to mask personally identifiable information (PII) while providing near real-time updates. You decide to use a secure view. Which of the following SQL statements is the MOST efficient and secure way to accomplish this, assuming the partner only needs 'customer id', 'masked_email', 'city', and 'state'? The email should be masked using SHA256.

A. Option C

B. Option E

C. Option B

D. Option D

E. Option A

Discussion 0

Correct Answer: E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #47

A data engineering team is building a real-time dashboard in Snowflake to monitor website traffic. The dashboard relies on a complex query that joins several large tables. The query execution time is consistently exceeding the acceptable threshold, impacting dashboard responsiveness. Historical data is stored in a separate table and rarely changes. You suspect caching is not being utilized effectively. Which of the following actions would BEST improve the performance of this dashboard and leverage Snowflake's caching features?

A. Materialize the historical data into a separate table that utilizes clustering and indexing for faster query performance. Refresh this table periodically.

B. Increase the size of the virtual warehouse. A larger warehouse will have more resources to execute the query, and the results will be cached for a longer period.

C. Create a materialized view that pre-computes the results of the complex query. Snowflake will automatically refresh the materialized view when the underlying data changes.

D. Use 'RESULT_SCAN' to cache the query result in the user session for subsequent queries. This is especially effective for large datasets that don't change frequently.

E. Replace the complex query with a series of simpler queries. This will reduce the amount of data that needs to be processed at any one time.

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #48

A healthcare provider stores patient data in Snowflake, including 'PATIENT ID', 'NAME, 'MEDICAL HISTORY , and 'INSURANCE ID. They need to comply with HIPAA regulations. As a data engineer, you need to ensure that PHI (Protected Health Information) is masked appropriately based on user roles. Which of the following steps are NECESSARY to achieve this using Snowflake's data masking features and RBAC? (Select all that apply)

A. Enforce multi-factor authentication (MFA) for all users accessing the Snowflake environment to enhance security and prevent unauthorized access to sensitive data.

B. Identify the columns containing PHI and create appropriate masking policies for each column (e.g., masking 'NAME, 'MEDICAL HISTORY, INSURANCE_ID).

C. Create custom roles representing different user groups within the organization (e.g., 'DOCTOR, 'NURSE, 'ADMIN') and grant them the necessary privileges to access the data, including 'SELECT on the tables and views containing patient data.

D. Apply the created masking policies to the corresponding columns in the patient data tables, ensuring that the masking policies are designed to reveal only the necessary information based on the user's role (e.g., doctors see full medical history, nurses see limited medical history, admins see de-identified data).

E. Grant the 'OWNERSHIP privilege on the 'PATIENT table to the 'ACCOUNTADMIN' role, ensuring complete control and management of the data by the administrator.

Discussion 0

Correct Answer: B,C,D Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #49

A data engineer notices that a daily ETL job loading data into a Snowflake table 'TRANSACTIONS' is consistently taking longer than expected. The table is append-only and partitioned by 'TRANSACTION DATE. The engineer observes high 'Remote Spill' during the load process and suspect that micro-partition pruning isn't working effectively. Which of the following approaches would BEST address the performance issue, assuming you have already considered increasing warehouse size?

A. Re-create the 'TRANSACTIONS' table with a larger virtual warehouse and re-load the entire dataset.

B. Enable automatic clustering on the 'TRANSACTION_DATE column of the 'TRANSACTIONS table.

C. Examine the data load process to ensure the data is loaded in 'TRANSACTION_DATE order. If not, sort the data by 'TRANSACTION_DATE before loading.

D. Partition the data in the source system by 'TRANSACTION DATE' and load data in parallel corresponding to each partition.

E. Implement data skipping by creating a masking policy on the 'TRANSACTION_DATE column.

Discussion 0

Correct Answer: B,C Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #50

You have a Snowflake table 'raw_data' with columns 'id', 'timestamp', and 'payload'. A stream is defined on this table. A data pipeline reads changes from the stream and applies transformations before loading the data into a target table. However, the pipeline needs to handle cases where updates to the same 'id' occur multiple times within a short period, and only the latest version of the 'payload' should be processed. How can you achieve this idempotent processing of stream data to ensure only the latest payload is applied to the target table, avoiding duplicates and inconsistencies, using Snowflake streams?

A. Configure the stream with a unique key constraint on the Sid' column to prevent multiple updates for the same Sid' from being captured.

B. Create a materialized view on the stream, grouping by 'id' and selecting the maximum 'timestamp' and corresponding 'payload'. Then, consume the materialized view instead of the stream.

C. When processing data from the stream, use a MERGE statement with a staging table. Load all stream changes into the staging table, then merge from the staging table to the target table using 'timestamp' to identify the latest version.

D. Use a regular Snowflake task to periodically merge the stream data into the target table, overwriting any existing records with the same Sid'.

E. Before loading data into target table, create a temporary table by grouping Sid' and selecting the maximum 'timestamp' and corresponding 'payload' from stream. Finally, load this data into target table.

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #51

You have a table named 'EMPLOYEES with a retention period of 1 day. You accidentally deleted several important rows from this table, but you need to recover the data'. You know the deletion occurred 25 hours ago. What actions should be taken to attempt to recover the deleted data, and what outcome can you expect? Assume you are working in an Enterprise edition of Snowflake account.

A. Attempt to use Time Travel to query the table before the deletion and re-insert the deleted rows. Expect the recovery to be successful as long as the deletion occurred within the data retention period.

B. Attempt to clone the table using Time Travel to a point in time before the deletion, then extract the deleted rows. Expect the recovery to be successful as long as the deletion occurred within the data retention period.

C. Since its Enterprise edition of Snowflake account, the Time travel and cloning will work with 7 days retention period, hence attemtp clone table using Time Travel and recover data successfully

D. Attempt to use UNDROP TABLE command if the table was dropped. Expect the recovery to be successful as long as the deletion occurred within the data retention period.

E. Attempt to use Time Travel or cloning to recover the data. Expect the recovery to fail because the deletion occurred outside the I-day data retention period.

Discussion 0

Correct Answer: E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #52

A data pipeline ingests clickstream data from various sources into a raw Snowflake table CRAW CLICKS). A transformation job then processes this data and loads it into a more structured 'CLICK EVENTS table, performing filtering, cleaning, and data enrichment. The data engineering team notices significant performance bottlenecks during this transformation process, leading to data freshness issues.
The team wants to optimize this process, considering the following:

A. Replace the transformation job with a series of smaller, more specialized jobs, each running on a separate virtual warehouse optimized for the specific task, and orchestrate these jobs using a data pipeline tool.

B. Create a materialized view on top of 'RAW CLICKS' that pre-computes the necessary transformations and aggregations, allowing the 'CLICK EVENTS' table to be populated directly from the materialized view.

C. Use a larger virtual warehouse for the transformation job and partition the 'RAW CLICKS table on the ingestion timestamp to improve data pruning and reduce the amount of data processed during the transformation.

D. Implement a change data capture (CDC) mechanism on the source systems to only ingest changed data into 'RAW CLICKS, reducing the overall data volume and the amount of data processed by the transformation job.

E. Optimize the transformation queries by identifying and rewriting inefficient SQL patterns, ensuring appropriate use of joins, filtering conditions, and data type conversions.

Discussion 0

Correct Answer: B,D,E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #53

A financial institution is using Snowflake to store transaction data for millions of customers. The data is stored in a table named 'TRANSACTIONS with columns such as 'TRANSACTION ID, 'CUSTOMER ID', 'TRANSACTION DATE, 'TRANSACTION_AMOUNT, and 'MERCHANT CATEGORY'. Analysts are running complex analytical queries that often involve filtering transactions by 'TRANSACTION_DATE, 'MERCHANT CATEGORY' , and 'TRANSACTION_AMOUNT ranges. These queries are experiencing performance bottlenecks. The data team wants to leverage query acceleration service to improve performance without significantly altering the existing query patterns. Which of the following actions or combination of actions would be MOST beneficial, considering the constraints and the nature of the queries? (Select TWO)

A. Create separate virtual warehouses dedicated to reporting queries and ad-hoc queries respectively. Enable query acceleration only for the warehouse running reporting queries.

B. Increase the size of the virtual warehouse used for running the queries and enable query acceleration on the warehouse without further modifications.

C. Create materialized views pre-aggregating the transaction data by 'MERCHANT_CATEGORY and 'TRANSACTION_DATE, and enable query acceleration on the virtual warehouse.

D. Enable Search Optimization Service for the 'TRANSACTIONS' table, specifically targeting the 'MERCHANT_CATEGORY column. Enable query acceleration on the virtual warehouse.

E. Enable Automatic Clustering on the 'TRANSACTIONS' table, ordering the keys as 'TRANSACTION_DATE, 'MERCHANT_CATEGORY', 'CUSTOMER_ID. Then, enable query acceleration on the virtual warehouse.

Discussion 0

Correct Answer: D,E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #54

You are configuring a Snowflake Data Clean Room for two healthcare providers, 'ProviderA' and 'ProviderB', to analyze patient overlap without revealing Personally Identifiable Information (PII). Both providers have patient data in their respective Snowflake accounts, including a 'PATIENT ID' column that uniquely identifies each patient. You need to create a secure join that allows the providers to determine the number of shared patients while protecting the raw 'PATIENT ID' values. Which of the following approaches is the most secure and efficient way to achieve this using Snowflake features? Select TWO options.

A. Share the raw 'PATIENT_ID' columns between ProviderA and ProviderB using secure data sharing, and then perform a JOIN operation in either ProviderA's or ProviderB's account.

B. Leverage Snowflake's differential privacy features to add noise to the patient ID data, share the modified dataset and perform a JOIN.

C. Create a hash of the 'PATIENT_ID' column in both ProviderA's and ProviderB's accounts using a consistent hashing algorithm (e.g., SHA256) and a secret salt known only to both providers. Share the hashed values through a secure view and perform a JOIN operation on the hashed values.

D. Utilize Snowflake's Secure Aggregate functions (e.g., APPROX_COUNT_DISTINCT) on the 'PATIENT_ID' column without sharing the underlying data. Each provider calculates the approximate distinct count of patient IDs, and the results are compared to estimate the overlap.

E. Implement tokenization of the 'PATIENT_ID' column in both ProviderA's and ProviderB's accounts. Share the tokenized values through a secure view and perform a JOIN operation on the tokens. Use a third party to deanonymize the tokens afterwards.

Discussion 0

Correct Answer: C,E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #55

A large e-commerce company stores clickstream data in an AWS S3 bucket. The data is partitioned by date and consists of Parquet files. They need to analyze this data in Snowflake without physically moving it into Snowflake's internal storage. However, the data frequently changes, and they need to ensure queries reflect the latest updates to the files without significant latency. Which of the following approaches would be MOST suitable, considering cost, performance, and data freshness?

A. Create a standard external table directly on the S3 bucket. Refresh the external table metadata using SALTER EXTERNAL TABLE ... REFRESH' on a daily schedule.

B. Create a standard external table with the 'AUTO REFRESH' parameter set to 'TRUE'. This will automatically refresh the metadata whenever changes are detected in S3.

C. Create an Iceberg table backed by the S3 bucket. Snowflake will automatically manage the metadata and handle incremental updates efficiently.

D. Create an external table using a Snowflake-managed catalog. Configure a Snowpipe to automatically refresh the metadata as new files are added to the S3 bucket.

E. Create a series of views on top of the S3 bucket using 'READ_PARQUET function, updating view definitions whenever underlying files change.

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #56

You have a Snowflake Stream named 'PRODUCT CHANGES' created on a table 'PRODUCTS'. A downstream task attempts to consume records from the stream, but occasionally fails with a 'Table PRODUCTS has been altered' error. The 'PRODUCTS' table undergoes DDL changes (e.g., adding/dropping columns) infrequently, but these changes are necessary for evolving business requirements. How can you design a more resilient data pipeline that minimizes disruptions caused by DDL changes to the 'PRODUCTS' table while still leveraging the 'PRODUCT CHANGES' stream?

A. Use a materialized view instead of a standard view as the source for the stream. Materialized views are less susceptible to issues when the underlying base table changes

B. Before executing any DDL changes on the 'PRODUCTS' table, drop and recreate the 'PRODUCT CHANGES' stream. This ensures the stream definition is always in sync with the table structure.

C. Create a new Stream on the 'PRODUCTS' table after each DDL change. The downstream task should dynamically switch to consuming from the new stream when the old stream encounters an error.

D. Create a task that monitors the 'PRODUCTS' table for DDL changes using 'INFORMATION SCHEMA. TABLES'. When a change is detected, pause the downstream task, execute the DDL change, and then resume the downstream task after a short delay.

E. Implement error handling in the downstream task to automatically retry consuming records from the 'PRODUCT CHANGES' stream after a delay, assuming the DDL changes will be completed quickly.

Discussion 0

Correct Answer: E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Free Snowflake DEA-C02 Practice Test & Real Exam Questions

Download Free Snowflake DEA-C02 Demo