AWS MLS-C01 Free Practice Questions — Page 3

Q: A Machine Learning Specialist is working with a large company to leverage machine learning within its products. The company wants to group its customers into categories based on which customers will and will not churn within the next 6 months. The company has labeled the data available to the Specialist. Which machine learning model type should the Specialist use to accomplish this task?

B. Classi cation. Classification is the correct machine learning model type for this task because the company needs to predict a binary outcome: whether a customer will churn (yes/no) or not churn (yes/no) within a 6-month period. Classification algorithms predict discrete categories or classes, which is exactly what's needed here—assigning each customer to either the "will churn" or "will not churn" class. The company has labeled historical data, meaning they know which customers churned and which didn't, making this a supervised classification problem. Common classification algorithms suitable for churn prediction include logistic regression, decision trees, random forests, and gradient boosting (all available in SageMaker). Why correct: Churn prediction is a binary classification problem (will churn / will not churn) Goal is predicting discrete categories/classes, not continuous values Data is labeled (supervised learning) Classification is the standard approach for churn prediction Why others are incorrect: A (Linear regression): Regression predicts continuous numerical values, not binary categories; inappropriate for churn prediction C (Clustering): Unsupervised learning; doesn't use the labeled data; identifies groups but doesn't predict churn status; no labeled target variable D (Reinforcement learning): For sequential decision-making and optimization; not applicable to static churn prediction; overkill and wrong paradigm

Q: A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a binary classifier based on two features: age of account and transaction month. The class distribution for these features is illustrated in thegure provided. mls_014_image_1.png Based on this information, which model would have the HIGHEST accuracy?

C. Support vector machine (SVM) with non-linear kernel. Why Option C is right: A Support Vector Machine (SVM) with a non-linear kernel (such as a Radial Basis Function or RBF kernel) is designed specifically to handle data that is not linearly separable. It projects the data into a higher-dimensional space where a decision boundary can be drawn to isolate the central cluster from the surrounding points.

Question 1

A Machine Learning Specialist receives customer data for an online shopping website. The data includes demographics, past visits, and locality information. The Specialist must develop a machine learning approach to identify the customer shopping patterns, preferences, and trends to enhance the website for better service and smart recommendations. Which solution should the Specialist recommend?

Answer

Show Answer & Explanation

Correct Answer: C. Collaborative filtering based on user interactions and correlations to identify patterns in the customer database.

Collaborative filtering is the correct approach to identify customer shopping patterns, preferences, and trends based on customer interactions and correlations. The question provides demographics, past visits, and locality information, which can be used to find similar customers and recommend products based on what similar customers have purchased or viewed. Collaborative filtering leverages user-to-user or item-to-item similarities to identify patterns and make personalized recommendations. This is distinct from content-based approaches that would analyze product features, or unsupervised clustering methods that would merely group customers without considering interaction patterns. Why correct: Collaborative filtering identifies patterns through user interactions and correlations Explicitly designed for recommendation systems and preference identification Uses customer similarity to drive trend discovery Aligns with stated objective: identify patterns, preferences, trends for recommendations Why others are incorrect: A (LDA): Latent Dirichlet Allocation is for topic modeling in text/document collections; not designed for customer preference prediction B (Neural network with 3+ layers): Overly complex for this use case; requires significant data and tuning; no specific advantage over collaborative filtering D (RCF): Random Cut Forest is for anomaly detection in time series data; not suitable for identifying customer preferences or patterns

Question 2

A Machine Learning Specialist is working with a large company to leverage machine learning within its products. The company wants to group its customers into categories based on which customers will and will not churn within the next 6 months. The company has labeled the data available to the Specialist. Which machine learning model type should the Specialist use to accomplish this task?

Answer

Show Answer & Explanation

Correct Answer: B. Classi cation

Classification is the correct machine learning model type for this task because the company needs to predict a binary outcome: whether a customer will churn (yes/no) or not churn (yes/no) within a 6-month period. Classification algorithms predict discrete categories or classes, which is exactly what's needed here—assigning each customer to either the "will churn" or "will not churn" class. The company has labeled historical data, meaning they know which customers churned and which didn't, making this a supervised classification problem. Common classification algorithms suitable for churn prediction include logistic regression, decision trees, random forests, and gradient boosting (all available in SageMaker). Why correct: Churn prediction is a binary classification problem (will churn / will not churn) Goal is predicting discrete categories/classes, not continuous values Data is labeled (supervised learning) Classification is the standard approach for churn prediction Why others are incorrect: A (Linear regression): Regression predicts continuous numerical values, not binary categories; inappropriate for churn prediction C (Clustering): Unsupervised learning; doesn't use the labeled data; identifies groups but doesn't predict churn status; no labeled target variable D (Reinforcement learning): For sequential decision-making and optimization; not applicable to static churn prediction; overkill and wrong paradigm

Question 3

The displayed graph is from a forecasting model for testing a time series. mls_013_image_1.png Considering the graph only, which conclusion should a Machine Learning Specialist make about the behavior of the model?

Answer

Show Answer & Explanation

Correct Answer: A. The model predicts both the trend and the seasonality well

Why Option A is correct: Trend: The "Actuals" (dark blue line) show a clear upward trajectory over time. The "Forecast" (green line) follows this same upward slope, maintaining a consistent gap (bias) but accurately capturing the overall direction of the data. Seasonality: The "Actuals" exhibit a repeating "sawtooth" pattern (periodic peaks and valleys). The "Forecast" line mirrors these oscillations almost perfectly in terms of timing and frequency. Even though the forecast is underestimating the absolute values (a vertical shift), the behavior—which includes the trend and the seasonal patterns—is captured correctly.

Question 4

A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a binary classifier based on two features: age of account and transaction month. The class distribution for these features is illustrated in thegure provided. mls_014_image_1.png Based on this information, which model would have the HIGHEST accuracy?

Answer

Show Answer & Explanation

Correct Answer: C. Support vector machine (SVM) with non-linear kernel

Why Option C is right: A Support Vector Machine (SVM) with a non-linear kernel (such as a Radial Basis Function or RBF kernel) is designed specifically to handle data that is not linearly separable. It projects the data into a higher-dimensional space where a decision boundary can be drawn to isolate the central cluster from the surrounding points.

Question 5

A Machine Learning Specialist at a company sensitive to security is preparing a dataset for model training. The dataset is stored in Amazon S3 and contains Personally Identi able Information (PII). The dataset:✑ Must be accessible from a VPC only.✑ Must not traverse the public internet.How can these requirements be satis ed?

Answer

Show Answer & Explanation

Correct Answer: A. Create a VPC endpoint and apply a bucket access policy that restricts access to the given VPC endpoint and the VPC.

Creating a VPC endpoint and applying a bucket access policy that restricts access to the given VPC endpoint and VPC ensures the dataset remains accessible only within the VPC without traversing the public internet. A VPC endpoint (specifically a Gateway endpoint for S3) creates a private connection from the VPC to S3, eliminating the need for internet access. The bucket policy must explicitly allow access only from the specified VPC endpoint, denying any other sources. This enforces both requirements: data is accessible only from within the VPC, and traffic never touches the public internet. The bucket policy is the control mechanism that enforces these access restrictions at the S3 level. Why correct: VPC endpoint creates private connection to S3 (no public internet) Bucket policy restricting to VPC endpoint ensures access control Both requirements satisfied: VPC-only + no public internet traversal Bucket policy is the enforcement mechanism for access restrictions

AWS MLS-C01 Free Practice Questions — Page 3

Ready for the Full MLS-C01 Experience?

Recommended Next Certifications