AWS MLS-C01 Free Practice Questions — Page 1

Machine Learning - Specialty • 5 questions • Answers & explanations included

Question 1

A large mobile network operating company is building a machine learning model to predict customers who are likely to unsubscribe from the service. The company plans to offer an incentive for these customers as the cost of churn is far greater than the cost of the incentive. The model produces the following confusion matrix after evaluating on a test dataset of 100 customers: mls_001_image_1.png Based on the model evaluation results, why is this a viable model for production?

Diagram for question 1
A. The model is 86% accurate and the cost incurred by the company as a result of false negatives isfiless than the false positives.
B. The precision of the model is 86%, which isfiless than the accuracy of the model.
C. The model is 86% accurate and the cost incurred by the company as a result of false positives isfiless than the false negatives.
D. The precision of the model is 86%, which is greater than the accuracy of the model.
Show Answer & Explanation

Correct Answer: C. The model is 86% accurate and the cost incurred by the company as a result of false positives isfiless than the false negatives.

Why Option C is right: In this specific business context, the company states that the cost of churn is far greater than the cost of the incentive. A False Negative (Actual Churn "Yes", Predicted "No") means a customer leaves, which is the highest cost. A False Positive (Actual Churn "No", Predicted "Yes") means giving an incentive to a loyal customer. While this is a small loss, it is much better than losing the customer. Since the model captures most churners and maintains high overall accuracy, it is a viable business solution.

Question 2

A Machine Learning Specialist is designing a system for improving sales for a company. The objective is to use the large amount of information the company has on users' behavior and product preferences to predict which products users would like based on the users' similarity to other users. What should the Specialist do to meet this objective?

A. Build a content-based filtering recommendation engine with Apache Spark ML on Amazon EMR
B. Build a collaborative filtering recommendation engine with Apache Spark ML on Amazon EMR.
C. Build a model-based filtering recommendation engine with Apache Spark ML on Amazon EMR
D. Build a combinative filtering recommendation engine with Apache Spark ML on Amazon EMR
Show Answer & Explanation

Correct Answer: B. Build a collaborative filtering recommendation engine with Apache Spark ML on Amazon EMR.

Collaborative filtering is the correct approach for this use case because it predicts user preferences based on similarities between users, not on product content characteristics. The question explicitly states the objective is to "predict which products users would like based on the users' similarity to other users," which is the defining principle of collaborative filtering. Apache Spark ML on Amazon EMR is an appropriate implementation platform for this algorithm, as it provides distributed computing capabilities for large-scale data processing. Collaborative filtering works by identifying user-user or item-item similarities and leveraging the preferences of similar users to make recommendations.Why correct: Collaborative filtering explicitly uses user similarity to make predictions Aligns with stated objective: "based on the users' similarity to other users" Spark ML on EMR supports distributed collaborative filtering implementation Why others are incorrect: A (Content-based filtering): Uses product features/content, not user similarity C (Model-based filtering): A variant of collaborative filtering, but less direct for this use case; the question emphasizes similarity between users D (Combinative filtering): Not a standard ML terminology; likely a distractor

Question 3

A Mobile Network Operator is building an analytics platform to analyze and optimize a company's operations using Amazon Athena and Amazon S3. The source systems send data in .CSV format in real time. The Data Engineering team wants to transform the data to the Apache Parquet format before storing it on Amazon S3. Which solution takes the LEAST effort to implement?

A. Ingest .CSV data using Apache Kafka Streams on Amazon EC2 instances and use Kafka Connect S3 to serialize data as Parquet
B. Ingest .CSV data from Amazon Kinesis Data Streams and use Amazon Glue to convert data into Parquet.
C. Ingest .CSV data using Apache Spark Structured Streaming in an Amazon EMR cluster and use Apache Spark to convert data into Parquet.
D. Ingest .CSV data from Amazon Kinesis Data Streams and use Amazon Kinesis Data Firehose to convert data into Parquet.
Show Answer & Explanation

Correct Answer: D. Ingest .CSV data from Amazon Kinesis Data Streams and use Amazon Kinesis Data Firehose to convert data into Parquet.

Amazon Kinesis Data Firehose with Parquet conversion is the LEAST effort solution because it's a fully managed, serverless service that handles data ingestion, transformation, and S3 delivery without requiring infrastructure management. Firehose can automatically convert incoming CSV data to Parquet format and deliver it directly to S3, eliminating the need to write custom transformation code or manage compute clusters. This approach reduces operational overhead compared to alternatives that require setting up Kafka clusters, EMR clusters, or custom Glue jobs. Firehose integrates natively with S3 and supports data format conversion as a built-in feature, making it the most straightforward implementation path. Why correct: Fully managed, serverless service—no infrastructure to manage Built-in CSV-to-Parquet conversion capability Automatic delivery to S3 Minimal code/configuration required Why others are incorrect: A (Kafka Streams + Kafka Connect): Requires setting up and managing Kafka infrastructure on EC2; significant operational overhead B (Kinesis Data Streams + Glue): Requires Glue job configuration and custom transformation logic; more effort than Firehose C (EMR + Spark Structured Streaming): Requires launching and managing an EMR cluster; highest operational complexity

Question 4

A city wants to monitor its air quality to address the consequences of air pollution. A Machine Learning Specialist needs to forecast the air quality in parts per million of contaminates for the next 2 days in the city. As this is a prototype, only daily data from the last year is available. Which model is MOST likely to provide the best results in Amazon SageMaker?

A. Use the Amazon SageMaker k-Nearest-Neighbors (kNN) algorithm on the single time series consisting of the full year of data with a predictor_type of regressor.
B. Use Amazon SageMaker Random Cut Forest (RCF) on the single time series consisting of the full year of data.
C. Use the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full year of data with a predictor_type of regressor.
D. Use the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full year of data with a predictor_type of classifier.
Show Answer & Explanation

Correct Answer: C. Use the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full year of data with a predictor_type of regressor.

Since the objective is to predict a continuous value (parts per million of contaminants), this is a regression task. The Linear Learner algorithm with predictor_type='regressor' is the most appropriate SageMaker built-in algorithm for forecasting a single time series when limited data is available. Option A (kNN) is typically less effective for simple time-series forecasting compared to Linear Learner, Option B (RCF) is for anomaly detection, and Option D (classifier) is for predicting discrete categories rather than continuous numbers.

Question 5

A Data Engineer needs to build a model using a dataset containing customer credit card information How can the Data Engineer ensure the data remains encrypted and the credit card information is secure?

A. Use a custom encryption algorithm to encrypt the data and store the data on an Amazon SageMaker instance in a VPC. Use the SageMaker DeepAR algorithm to randomize the credit card numbers.
B. Use an IAM policy to encrypt the data on the Amazon S3 bucket and Amazon Kinesis to automatically discard credit card numbers and insert fake credit card numbers.
C. Use an Amazon SageMaker launch configuration to encrypt the data once it is copied to the SageMaker instance in a VPC. Use the SageMaker principal component analysis (PCA) algorithm to reduce the length of the credit card numbers.
D. Use AWS KMS to encrypt the data on Amazon S3 and Amazon SageMaker, and redact the credit card numbers from the customer data with AWS Glue.
Show Answer & Explanation

Correct Answer: D. Use AWS KMS to encrypt the data on Amazon S3 and Amazon SageMaker, and redact the credit card numbers from the customer data with AWS Glue.

Using AWS KMS to encrypt data on both S3 and SageMaker, combined with AWS Glue to redact credit card numbers, is the most secure and appropriate solution for handling sensitive payment data. AWS KMS provides enterprise-grade encryption key management with audit trails, encryption at rest, and encryption in transit—meeting compliance requirements for credit card information (PCI-DSS). AWS Glue's data catalog and ETL capabilities can automatically identify and redact sensitive data patterns (such as credit card numbers) before the data reaches SageMaker. This approach provides defense-in-depth: encryption for confidentiality and data redaction for further protection. KMS keys are managed separately from the data, preventing unauthorized access even if storage is compromised. Why correct: AWS KMS is the standard for sensitive data encryption (PCI-DSS compliant) Redaction removes sensitive data entirely—defense-in-depth Encrypts both at-rest (S3, SageMaker) and in-transit KMS audit logs track all key access Glue can identify sensitive patterns and redact programmatically

Ready for the Full MLS-C01 Experience?

Access all 58 pages of practice questions, track your progress, and simulate the real exam with timed mode.

Start Interactive Quiz →