Free Snowflake DEA-C02 Practice Test & Real Exam Questions, Page 6

Question #71

A data engineer is facing performance issues with a complex analytical query in Snowflake. The query joins several large tables and uses multiple window functions. The query profile indicates that a significant amount of time is spent in the 'Remote Spill' stage. This means the data from one of the query stages is spilling to the remote disk. What are the possible root causes for 'Remote Spill' and what steps can be taken to mitigate this issue? Select two options.

A. The data being queried is stored in a non-Snowflake database, making it difficult to optimize the join.

B. The window functions are operating on large partitions of data, exceeding the available memory on the compute nodes. Try to reduce the partition size by pre- aggregating the data or using filtering before applying the window functions.

C. The query is using a non-optimal join strategy. Review the query profile and consider using join hints to force a different join order or algorithm.

D. The 'Remote Spill' indicates network latency issues between compute nodes. There is nothing the data engineer can do to fix this; it is an infrastructure issue.

E. The virtual warehouse is not appropriately sized for the volume of data and complexity of the query. Increasing the virtual warehouse size might provide sufficient memory to avoid spilling.

Discussion 0

Correct Answer: B,E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #72

You are designing a data pipeline that requires applying a complex scoring algorithm to customer data in Snowflake. This algorithm involves multiple steps, including feature engineering, model loading, and prediction. You want to encapsulate this logic within a reusable component and apply it to incoming data streams efficiently. Which of the following approaches is most suitable and scalable for implementing this scoring logic as a UDF/UDTF, considering real-time data processing and low latency requirements?

A. A SQL UDF containing a series of nested CASE statements to implement the entire scoring algorithm.

B. A Java UDTF that leverages a custom Java library for feature engineering and model prediction, deployed as a JAR file to Snowflake's internal stage.

C. A Python UDTF using Snowpark, leveraging external libraries like 'torch' for accelerated calculations and ML model inference by GPU.

D. A JavaScript UDF that uses basic JavaScript functions to perform the entire scoring algorithm without external dependencies.

E. A Python UDF that loads a pre-trained machine learning model (e.g., using scikit-learn) and performs predictions on the input data.

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #73

You are tasked with creating a Snowpark Python UDF that calculates the exponential moving average (EMA) of a time series dataset stored in a Snowflake table named 'SALES DATA'. The table has columns 'TIMESTAMP' (TIMESTAMP_NTZ) and 'SALES' (NUMBER). The EMA should be calculated for each product, identified by the 'PRODUCT ID' column. You want to optimize the calculation by using a Pandas DataFrame within the UDF and leveraging vectorized operations. Which of the following code snippets would be the MOST efficient and correct way to achieve this? Assume 'alpha' is a predefined float variable representing the smoothing factor.

A. Option C

B. Option E

C. Option B

D. Option D

E. Option A

Discussion 0

Correct Answer: B Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #74

Consider a scenario where you have a large dataset of sensor readings stored in a Snowflake table called 'SENSOR DATA'. You need to build an external function to perform complex calculations on these readings using a custom Python library hosted on AWS Lambda'. The calculation requires significant computational resources, and you want to optimize the data transfer between Snowflake and the Lambda function. The following SQL is provided: CREATE OR REPLACE EXTERNAL FUNCTION ARRAY) RETURNS ARRAY VOLATILE MAX BATCH ROWS = 2000 RETURNS NULL ON NULL INPUT API INTEGRATION = aws_lambda_integration AS 'arn:aws:lambda:us-east-1:123456789012:function:sensorProcessor'; Which of the following options would further optimize the performance and reduce data transfer costs, assuming the underlying Lambda function is correctly configured and functional?

A. Rewrite the custom Python library in Java and create a Snowflake User-Defined Function (UDF) instead of using an external function.

B. Reduce the number of columns passed to the external function by performing pre-aggregation or filtering on the data within Snowflake before calling the function.

C. Compress the data before sending it to the external function and decompress it within the Lambda function. Update the Lambda function to compress the array of results before sending it back to Snowflake and use Snowflake+s functions to decompress it.

D. Increase the 'MAX BATCH ROWS' parameter to the maximum allowed value to send larger batches of data to the external function. Ensure Lambda function memory is increased appropriately.

E. Convert the input data to a binary format (e.g., using 'TO_BINARY and FROM_BINARY' functions in Snowflake) before sending it to the Lambda function, and decode it in Lambda to reduce the size of the data being transmitted.

Discussion 0

Correct Answer: B,C,D Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #75

A Snowflake table 'PRODUCT REVIEWS' is being ingested into from an external system. You have a stream 'PRODUCT REVIEWS STREAM' defined on this table to capture changes. Due to a bug in the ingestion process, incorrect data was loaded for a specific period. You need to correct the data'. Which of the following SQL statements, when executed against the 'PRODUCT REVIEWS STREAM' , will return the number of rows that were inserted, updated, and deleted during that period?

A.

B.

C.

D.

E.

Discussion 0

Correct Answer: A Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #76

You have a Snowflake table, 'CUSTOMER ORDERS', with columns like 'CUSTOMER ID', 'ORDER DATE', 'ORDER AMOUNT', and 'REGION'. A Bl dashboard relies on a query that aggregates data from this table, but the query performance is consistently slow. The query frequently filters by 'ORDER DATE and groups by 'REGION'. Based on the following 'EXPLAIN' output, which combination of techniques should be considered to improve the performance the most?

A. Create a materialized view that pre-aggregates the data by 'ORDER DATE and 'REGION'

B. Redesign the dashboard to minimize the data being displayed at once to the user.

C. Increase the virtual warehouse size to 'LARGE or 'XLARGE.

D. Cluster the 'CUSTOMER ORDERS table on 'ORDER DATE' and 'REGION'.

E. Create an index on the 'ORDER DATE column.

Discussion 0

Correct Answer: A,D Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #77

You have a table named 'ORDERS' with a column 'ORDER DETAILS' that contains JSON data'. You want to extract a specific nested value ('customer id') from this JSON data using a SQL UDE The JSON structure varies, and sometimes the 'customer id' field might be missing. You need to create a UDF that handles missing fields gracefully and returns NULL if 'customer id' is not found. Also, You are looking for a performant solution that is highly scalable. Which of the following SQL UDF definitions is most appropriate?

A.

B.

C.

D.

E.

Discussion 0

Correct Answer: E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #78

You have created a JavaScript UDF named 'calculate discount' in Snowflake that takes two arguments: 'product_price' (NUMBER) and 'discount_percentage' (NUMBER). The UDF calculates the discounted price using the formula: 'product_price (1 - discount_percentage / 100)'. However, when you call the UDF with certain input values, you are encountering unexpected results, specifically with very large or very small numbers due to JavaScript's number precision limitations. Which of the following strategies can you implement to mitigate this issue and ensure accurate calculations within your JavaScript UDF?

A. Convert the input numbers to strings within the JavaScript UDF before performing the calculation.

B. Cast input arguments and the result to 'FLOAT within the UDF.

C. Use JavaScript's 'toFixed(V method to round the result to a fixed number of decimal places.

D. Utilize a JavaScript library specifically designed for handling arbitrary-precision arithmetic, such as 'Big.js' or 'Decimal.jS , within the UDF.

E. Avoid large or small number and stick to the limited range of input values.

Discussion 0

Correct Answer: D Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #79

You are designing a data loading process for a high-volume streaming data source. The data arrives as Avro files in an AWS S3 bucket. You need to load this data into a Snowflake table with minimal latency and operational overhead. Which of the following combinations of Snowflake features and configurations would be MOST suitable for this scenario? (Select TWO)

A. Configure an external table pointing to the S3 bucket and query the Avro files directly from Snowflake.

B. Use the 'COPY INTO' command with a scheduled task that runs every 5 minutes to load new files from the S3 bucket.

C. Create a custom Spark application that reads Avro files from S3, transforms the data, and then writes it to Snowflake using the Snowflake Spark connector.

D. Use a Kafka connector to stream data directly from the Kafka topic to Snowflake.

E. Implement Snowpipe with auto-ingest configured to listen for S3 event notifications whenever a new Avro file is added to the bucket.

Discussion 0

Correct Answer: D,E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #80

A financial institution needs to mask sensitive customer data (PII) in a 'CUSTOMER' table. The table contains columns like 'CUSTOMER ID', 'FIRST NAME', 'LAST NAME', 'CREDIT CARD, and 'ADDRESS'. The data should be masked differently for different roles: 'ANALYST' role should see obfuscated values for names and addresses, while the 'SUPPORT' role should see the last four digits of the credit card and a hashed version of the address. The "CUSTOMER ID' should never be masked. Assume a central masking policy already exists called 'PII MASKING POLICY. Which of the following statements is the MOST efficient and secure way to achieve this?

A. Create multiple masking policies, one for each role and sensitive column combination, each with the appropriate masking expression. Then, apply each masking policy individually to its respective column. Use the function to implement role-based masking within each policy.

B. Create external functions to handle the complex masking logic and call them from the masking policy.

C. Create multiple masking policies with different masking expressions and apply them directly to the columns based on the role using conditional expressions within the policies. Use 'CASE statements within the masking policy to differentiate between roles.

D. Create a single masking policy with a complex stored procedure that checks the current role and applies different masking functions accordingly, then apply this policy to all sensitive columns.

E. Create view for each role which applies masking functions to the columns. Grant SELECT access on those views to relevant roles.

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #81

You are building a data pipeline in Snowflake that uses an external function to perform sentiment analysis on customer reviews stored in a table named 'CUSTOMER REVIEWS'. The external function 'sentiment_analyzer' is hosted on AWS Lambda and requires an API key for authentication. You want to ensure that the API key is securely passed to the Lambda function and prevent unauthorized access. Which of the following approaches represents the MOST secure and recommended method to manage the API key?

A. Embed the API key directly into the AWS Lambda function's environment variables, avoiding any transmission from Snowflake.

B. Pass the API key as a parameter to the external function each time it is called.

C. Create a Snowflake secret object to store the API key and reference it in the external function definition using the 'USING' clause and 'SYSTEM$GET SECRET function.

D. Store the API key in a Snowflake table with restricted access and retrieve it within the external function's logic.

E. Store the API key directly in the external function definition as a string literal within the 'AS' clause.

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #82

A data engineering team is responsible for an ELT pipeline that loads data into Snowflake. The pipeline has two distinct stages: a high- volume, low-complexity transformation stage using SQL on raw data, and a low-volume, high-complexity transformation stage using Python UDFs that leverages an external service for data enrichment. The team is experiencing significant queueing during peak hours, particularly impacting the high-volume stage. You need to optimize warehouse configuration to minimize queueing. Which combination of actions would be MOST effective?

A. Create two separate warehouses: a Large, multi-cluster warehouse configured for auto-scale for the high-volume, low-complexity transformations and a Small warehouse for the low-volume, high-complexity transformations.

B. Create a single, large (e.g., X-Large) warehouse and rely on Snowflake's automatic scaling to handle the workload.

C. Create two separate warehouses: a Medium warehouse for the high-volume, low-complexity transformations and an X-Small warehouse for the low-volume, high-complexity transformations.

D. Create a single, X-Small warehouse and rely on Snowflake's query acceleration service to handle the workload.

E. Create two separate warehouses: a Small warehouse configured for auto-suspend after 5 minutes for the high-volume, low-complexity transformations and a Large warehouse configured for auto-suspend after 60 minutes for the low-volume, high-complexity transformations.

Discussion 0

Correct Answer: A Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #83

You're managing a Snowflake data warehouse and need to create a development environment for testing a complex stored procedure that updates a critical table, 'SALES DATA'. The procedure is located in the 'PRODUCTION' database and you want to ensure minimal impact to the production environment during development. You decide to use cloning and time travel. Which of the following strategies is the MOST efficient and safest approach to achieve this, minimizing downtime and resource consumption in production?

A. Clone the entire 'PRODUCTION' database into a new development database. This ensures developers have access to all necessary data and dependencies but consumes significant storage and may impact production performance during the cloning process.

B. Clone only the 'SALES DATA' table into a development database. This minimizes storage consumption but requires developers to manually recreate or mock any dependencies the stored procedure has on other tables in the 'PRODUCTION' database.

C. Clone the schema in which 'SALES_DATX is stored along with the stored procedure. Use time travel on the cloned schema to revert all objects in the schema to a point in time before the stored procedure was last run, then update the stored procedure to point to the cloned schema. This gives a consistent starting point for testing in development.

D. Create a snapshot of the 'SALES DATA' table using Time Travel at a specific timestamp (e.g., 1 hour ago), then clone only the stored procedure, updating it to point to the Time Travel version of 'SALES DATA' in the development environment. This provides a consistent dataset for testing while minimizing the impact on production and cloned data volumes.

E. Clone the "PRODUCTION' database. Immediately after cloning, use Time Travel to revert the 'SALES_DATR table in the development database to a state before the stored procedure was last run in production. Then clone the stored procedure itself. This gives a starting point without the procedure's impact.

Discussion 0

Correct Answer: C Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Question #84

You have a Snowflake table called 'ORDERS' with a column named 'ORDER DETAILS that contains JSON data'. You need to create a UDF to extract a specific value from the JSON and return it. The JSON structure is consistent: '{'customer': {'name': 'John Doe', 'email': '[email protected]'}, 'items': [{'item id': 1, 'quantity': 2}, {'item id': 2, 'quantity': You want to extract the customer's email address. Which of the following UDF definitions is the most efficient and correct?

A. Option C

B. Option E

C. Option B

D. Option D

E. Option A

Discussion 0

Correct Answer: E Vote an answer

Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).

Free Snowflake DEA-C02 Practice Test & Real Exam Questions

Download Free Snowflake DEA-C02 Demo