Free Databricks Associate-Developer-Apache-Spark Practice Test & Real Exam Questions
Which of the following describes how Spark achieves fault tolerance?
Correct Answer: B
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
Which of the following code blocks returns a 2-column DataFrame that shows the distinct values in column productId and the number of rows with that productId in DataFrame transactionsDf?
Correct Answer: A
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
The code block displayed below contains an error. The code block is intended to write DataFrame transactionsDf to disk as a parquet file in location /FileStore/transactions_split, using column storeId as key for partitioning. Find the error.
Code block:
transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_split")A.
Code block:
transactionsDf.write.format("parquet").partitionOn("storeId").save("/FileStore/transactions_split")A.
Correct Answer: A
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
Which of the following statements about Spark's configuration properties is incorrect?
Correct Answer: E
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
Which of the following code blocks returns a DataFrame with approximately 1,000 rows from the 10,000-row DataFrame itemsDf, without any duplicates, returning the same rows even if the code block is run twice?
Correct Answer: B
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
Which of the following describes Spark's way of managing memory?
Correct Answer: B
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
Which of the following code blocks returns a one-column DataFrame of all values in column supplier of DataFrame itemsDf that do not contain the letter X? In the DataFrame, every value should only be listed once.
Sample of DataFrame itemsDf:
1.+------+--------------------+--------------------+-------------------+
2.|itemId| itemName| attributes| supplier|
3.+------+--------------------+--------------------+-------------------+
4.| 1|Thick Coat for Wa...|[blue, winter, cozy]|Sports Company Inc.|
5.| 2|Elegant Outdoors ...|[red, summer, fre...| YetiX|
6.| 3| Outdoors Backpack|[green, summer, t...|Sports Company Inc.|
7.+------+--------------------+--------------------+-------------------+
Sample of DataFrame itemsDf:
1.+------+--------------------+--------------------+-------------------+
2.|itemId| itemName| attributes| supplier|
3.+------+--------------------+--------------------+-------------------+
4.| 1|Thick Coat for Wa...|[blue, winter, cozy]|Sports Company Inc.|
5.| 2|Elegant Outdoors ...|[red, summer, fre...| YetiX|
6.| 3| Outdoors Backpack|[green, summer, t...|Sports Company Inc.|
7.+------+--------------------+--------------------+-------------------+
Correct Answer: A
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
Which of the following code blocks creates a new one-column, two-row DataFrame dfDates with column date of type timestamp?
Correct Answer: C
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
The code block displayed below contains an error. The code block should use Python method find_most_freq_letter to find the letter present most in column itemName of DataFrame itemsDf and return it in a new column most_frequent_letter. Find the error.
Code block:
1. find_most_freq_letter_udf = udf(find_most_freq_letter)
2. itemsDf.withColumn("most_frequent_letter", find_most_freq_letter("itemName"))
Code block:
1. find_most_freq_letter_udf = udf(find_most_freq_letter)
2. itemsDf.withColumn("most_frequent_letter", find_most_freq_letter("itemName"))
Correct Answer: E
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
The code block displayed below contains an error. The code block should count the number of rows that have a predError of either 3 or 6. Find the error.
Code block:
transactionsDf.filter(col('predError').in([3, 6])).count()
Code block:
transactionsDf.filter(col('predError').in([3, 6])).count()
Correct Answer: C
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
Which of the following describes the conversion of a computational query into an execution plan in Spark?
Correct Answer: E
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
Which of the following describes a shuffle?
Correct Answer: D
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
Which of the following statements about storage levels is incorrect?
Correct Answer: A
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
Which of the following code blocks returns only rows from DataFrame transactionsDf in which values in column productId are unique?
Correct Answer: C
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
Which of the following code blocks performs an inner join between DataFrame itemsDf and DataFrame transactionsDf, using columns itemId and transactionId as join keys, respectively?
Correct Answer: A
Vote an answer
Explanation: Only visible for Pass4Leader members. You can sign-up / login (it's free).
