AWS MLS-C01 Free Practice Questions — Page 2

Question 1

A Machine Learning Specialist is using an Amazon SageMaker notebook instance in a private subnet of a corporate VPC. The ML Specialist has important data stored on the Amazon SageMaker notebook instance's Amazon EBS volume, and needs to take a snapshot of that EBS volume. However, the ML Specialist cannot find the Amazon SageMaker notebook instance's EBS volume or Amazon EC2 instance within the VPC. Why is the ML Specialist not seeing the instance visible in the VPC?

Answer

Show Answer & Explanation

Correct Answer: C. Amazon SageMaker notebook instances are based on EC2 instances running within AWS service accounts.

Amazon SageMaker notebook instances are based on EC2 instances that run within AWS service accounts, not customer accounts. This is why the ML Specialist cannot see the notebook instance or its EBS volume in their own VPC—the underlying EC2 infrastructure is managed by AWS in a separate service account. While the notebook instance appears as a resource in the customer's SageMaker console and can access resources in the customer's VPC (via ENI attachment or other mechanisms), the actual compute instance itself exists outside the customer's account. This architecture allows AWS to manage the infrastructure, apply patches, and handle backups without exposing these details to the customer. To snapshot the EBS volume, the Specialist should use SageMaker's native snapshot or export features, not EC2 console actions. Why correct: SageMaker notebook instances are managed services running in AWS accounts EC2 instance is in an AWS service account, not customer account This explains why it's not visible in customer's VPC/EC2 console Aligns with AWS managed service architecture pattern Why others are incorrect: A (Outside VPCs): Incorrect—SageMaker instances do interact with customer VPCs, but underlying EC2 is in service account B (Based on ECS): Incorrect—SageMaker uses EC2, not ECS D (ECS in service accounts): Doubly incorrect—uses ECS instead of EC2, and while service account part is correct

Question 2

A Machine Learning Specialist is building a model that will perform time series forecasting using Amazon SageMaker. The Specialist has nished training the model and is now planning to perform load testing on the endpoint so they can configure Auto Scaling for the model variant. Which approach will allow the Specialist to review the latency, memory utilization, and CPU utilization during the load test?

Answer

Show Answer & Explanation

Correct Answer: B. Generate an Amazon CloudWatch dashboard to create a single view for the latency, memory utilization, and CPU utilization metrics that are outputted by Amazon SageMaker.

An Amazon CloudWatch dashboard is the most straightforward approach to monitor latency, memory utilization, and CPU utilization metrics during a SageMaker endpoint load test. CloudWatch natively collects these metrics from SageMaker endpoints and displays them in a unified dashboard without additional integration or tooling. The dashboard provides real-time visualization of all three metrics requested (latency, memory, CPU) in a single pane of glass, making it ideal for load testing analysis. CloudWatch metrics are automatically published by SageMaker and require no additional configuration—simply create a dashboard and add the relevant metric widgets. This approach is simpler than alternatives that require setting up additional infrastructure (Athena, Elasticsearch, custom logging). Why correct: CloudWatch natively integrates with SageMaker endpoints Automatic metric collection (no custom instrumentation needed) Real-time dashboard visualization Includes all three requested metrics (latency, memory, CPU) Simplest setup—no additional infrastructure

Question 3

A manufacturing company has structured and unstructured data stored in an Amazon S3 bucket. A Machine Learning Specialist wants to use SQL to run queries on this data. Which solution requires the LEAST effort to be able to query this data?

Answer

Show Answer & Explanation

Correct Answer: B. Use AWS Glue to catalogue the data and Amazon Athena to run queries.

Using AWS Glue to catalog the data and Amazon Athena to run SQL queries requires the LEAST effort because it's a serverless, schema-on-read approach that works with both structured and unstructured data in S3 without prior transformation. AWS Glue's Crawler automatically scans S3, infers schema, and creates a data catalog—no manual schema definition needed. Athena then allows SQL queries directly against S3 data using the Glue catalog, with no need to move or transform data beforehand. This solution is purely serverless, pay-per-query, and requires minimal configuration. Glue handles the complexity of discovering data structure, while Athena handles querying with standard SQL, making this the lowest-effort path to SQL query capability. Why correct: Glue Crawler automatically catalogs data (no manual schema definition) Athena queries S3 directly without ETL or data movement Serverless approach—no infrastructure management Works with both structured and unstructured data Schema-on-read model minimizes upfront effort No data transformation required before querying

Question 4

A Machine Learning Specialist is developing a custom video recommendation model for an application. The dataset used to train this model is very large with millions of data points and is hosted in an Amazon S3 bucket. The Specialist wants to avoid loading all of this data onto an Amazon SageMaker notebook instance because it would take hours to move and will exceed the attached 5 GB Amazon EBS volume on the notebook instance. Which approach allows the Specialist to use all the data to train the model?

Answer

Show Answer & Explanation

Correct Answer: A. Load a smaller subset of the data into the SageMaker notebook and train locally. Con rm that the training code is executing and the model parameters seem reasonable. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode.

The specialist should load a small subset of data locally to validate training code and hyperparameters, then use Pipe input mode in a SageMaker training job to train on the full dataset from S3. Pipe mode streams data from S3 to the training instance on-the-fly, avoiding the need to download and load all data into memory or storage. This approach keeps the 5 GB EBS volume constraint and circumvents the hours-long download process. Pipe input mode is specifically designed for large datasets and distributed training scenarios where loading the entire dataset upfront is impractical. This two-phase approach (local validation + distributed training with Pipe mode) is the most efficient and practical solution. Why correct: Pipe input mode streams data from S3—doesn't require full download Local validation confirms code correctness before scaling Avoids exceeding 5 GB EBS volume limit Designed specifically for large datasets that don't fit on single instance Enables training on millions of data points without local storage constraints Why others are incorrect: B (EC2 Deep Learning AMI): Requires launching additional EC2 infrastructure; doesn't solve the storage problem; adds operational overhead C (Glue + Pipe mode): Glue is for ETL, not training validation; unnecessary step; Pipe mode is correct, but Glue preprocessing adds complexity D (Local validation + EC2 AMI): Still requires managing EC2 infrastructure; doesn't leverage SageMaker's distributed training; more complex than A

Question 5

A Machine Learning Specialist has completed a proof of concept for a company using a small data sample, and now the Specialist is ready to implement an end- to-end solution in AWS using Amazon SageMaker. The historical training data is stored in Amazon RDS. Which approach should the Specialist use for training a model using that data?

Answer

Show Answer & Explanation

Correct Answer: B. Push the data from Microsoft SQL Server to Amazon S3 using an AWS Data Pipeline and provide the S3 location within the notebook.

The specialist should push data from Amazon RDS to Amazon S3 using AWS Data Pipeline, then provide the S3 location to the SageMaker notebook. While pulling data directly from RDS (Option A) is possible, the recommended best practice for SageMaker training is to stage data in S3 first. AWS Data Pipeline provides a simple, orchestrated way to move data from RDS to S3, ensuring data is in the optimal format and location for SageMaker training jobs. SageMaker is designed to work with S3 as the primary data source, providing better performance and cost efficiency than direct database connections. This approach also decouples the training process from the live database, preventing performance impact on production systems. Why correct: AWS Data Pipeline provides managed data movement from RDS to S3 S3 is the native/optimal data source for SageMaker Decouples training from production database Data Pipeline handles scheduling and orchestration Follows AWS best practices for SageMaker workflows Enables scalable, repeated training runs

AWS MLS-C01 Free Practice Questions — Page 2

Ready for the Full MLS-C01 Experience?

Recommended Next Certifications