AWS MLS-C01 Free Practice Questions — Page 2

Machine Learning - Specialty • 5 questions • Answers & explanations included

Question 6

A Machine Learning Specialist is using an Amazon SageMaker notebook instance in a private subnet of a corporate VPC. The ML Specialist has important data stored on the Amazon SageMaker notebook instance's Amazon EBS volume, and needs to take a snapshot of that EBS volume. However, the ML Specialist cannot find the Amazon SageMaker notebook instance's EBS volume or Amazon EC2 instance within the VPC. Why is the ML Specialist not seeing the instance visible in the VPC?

A. Amazon SageMaker notebook instances are based on the EC2 instances within the customer account, but they run outside of VPCs.
B. Amazon SageMaker notebook instances are based on the Amazon ECS service within customer accounts.
C. Amazon SageMaker notebook instances are based on EC2 instances running within AWS service accounts.
D. Amazon SageMaker notebook instances are based on AWS ECS instances running within AWS service accounts.
Show Answer & Explanation

Correct Answer: C. Amazon SageMaker notebook instances are based on EC2 instances running within AWS service accounts.

Amazon SageMaker notebook instances are based on EC2 instances that run within AWS service accounts, not customer accounts. This is why the ML Specialist cannot see the notebook instance or its EBS volume in their own VPC—the underlying EC2 infrastructure is managed by AWS in a separate service account. While the notebook instance appears as a resource in the customer's SageMaker console and can access resources in the customer's VPC (via ENI attachment or other mechanisms), the actual compute instance itself exists outside the customer's account. This architecture allows AWS to manage the infrastructure, apply patches, and handle backups without exposing these details to the customer. To snapshot the EBS volume, the Specialist should use SageMaker's native snapshot or export features, not EC2 console actions. Why correct: SageMaker notebook instances are managed services running in AWS accounts EC2 instance is in an AWS service account, not customer account This explains why it's not visible in customer's VPC/EC2 console Aligns with AWS managed service architecture pattern Why others are incorrect: A (Outside VPCs): Incorrect—SageMaker instances do interact with customer VPCs, but underlying EC2 is in service account B (Based on ECS): Incorrect—SageMaker uses EC2, not ECS D (ECS in service accounts): Doubly incorrect—uses ECS instead of EC2, and while service account part is correct

Question 7

A Machine Learning Specialist is building a model that will perform time series forecasting using Amazon SageMaker. The Specialist has nished training the model and is now planning to perform load testing on the endpoint so they can configure Auto Scaling for the model variant. Which approach will allow the Specialist to review the latency, memory utilization, and CPU utilization during the load test?

A. Review SageMaker logs that have been written to Amazon S3 by leveraging Amazon Athena and Amazon QuickSight to visualize logs as they are being produced.
B. Generate an Amazon CloudWatch dashboard to create a single view for the latency, memory utilization, and CPU utilization metrics that are outputted by Amazon SageMaker.
C. Build custom Amazon CloudWatch Logs and then leverage Amazon ES and Kibana to query and visualize the log data as it is generated by Amazon SageMaker.
D. Send Amazon CloudWatch Logs that were generated by Amazon SageMaker to Amazon ES and use Kibana to query and visualize the log data.
Show Answer & Explanation

Correct Answer: B. Generate an Amazon CloudWatch dashboard to create a single view for the latency, memory utilization, and CPU utilization metrics that are outputted by Amazon SageMaker.

An Amazon CloudWatch dashboard is the most straightforward approach to monitor latency, memory utilization, and CPU utilization metrics during a SageMaker endpoint load test. CloudWatch natively collects these metrics from SageMaker endpoints and displays them in a unified dashboard without additional integration or tooling. The dashboard provides real-time visualization of all three metrics requested (latency, memory, CPU) in a single pane of glass, making it ideal for load testing analysis. CloudWatch metrics are automatically published by SageMaker and require no additional configuration—simply create a dashboard and add the relevant metric widgets. This approach is simpler than alternatives that require setting up additional infrastructure (Athena, Elasticsearch, custom logging). Why correct: CloudWatch natively integrates with SageMaker endpoints Automatic metric collection (no custom instrumentation needed) Real-time dashboard visualization Includes all three requested metrics (latency, memory, CPU) Simplest setup—no additional infrastructure

Question 8

A manufacturing company has structured and unstructured data stored in an Amazon S3 bucket. A Machine Learning Specialist wants to use SQL to run queries on this data. Which solution requires the LEAST effort to be able to query this data?

A. Use AWS Data Pipeline to transform the data and Amazon RDS to run queries.
B. Use AWS Glue to catalogue the data and Amazon Athena to run queries.
C. Use AWS Batch to run ETL on the data and Amazon Aurora to run the queries.
D. Use AWS Lambda to transform the data and Amazon Kinesis Data Analytics to run queries.
Show Answer & Explanation

Correct Answer: B. Use AWS Glue to catalogue the data and Amazon Athena to run queries.

Using AWS Glue to catalog the data and Amazon Athena to run SQL queries requires the LEAST effort because it's a serverless, schema-on-read approach that works with both structured and unstructured data in S3 without prior transformation. AWS Glue's Crawler automatically scans S3, infers schema, and creates a data catalog—no manual schema definition needed. Athena then allows SQL queries directly against S3 data using the Glue catalog, with no need to move or transform data beforehand. This solution is purely serverless, pay-per-query, and requires minimal configuration. Glue handles the complexity of discovering data structure, while Athena handles querying with standard SQL, making this the lowest-effort path to SQL query capability. Why correct: Glue Crawler automatically catalogs data (no manual schema definition) Athena queries S3 directly without ETL or data movement Serverless approach—no infrastructure management Works with both structured and unstructured data Schema-on-read model minimizes upfront effort No data transformation required before querying

Question 9

A Machine Learning Specialist is developing a custom video recommendation model for an application. The dataset used to train this model is very large with millions of data points and is hosted in an Amazon S3 bucket. The Specialist wants to avoid loading all of this data onto an Amazon SageMaker notebook instance because it would take hours to move and will exceed the attached 5 GB Amazon EBS volume on the notebook instance. Which approach allows the Specialist to use all the data to train the model?

A. Load a smaller subset of the data into the SageMaker notebook and train locally. Con rm that the training code is executing and the model parameters seem reasonable. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode.
B. Launch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to the instance. Train on a small amount of the data to verify the training code and hyperparameters. Go back to Amazon SageMaker and train using the full dataset
C. Use AWS Glue to train a model using a small subset of the data to con rm that the data will be compatible with Amazon SageMaker. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode.
D. Load a smaller subset of the data into the SageMaker notebook and train locally. Con rm that the training code is executing and the model parameters seem reasonable. Launch an Amazon EC2 instance with an AWS Deep Learning AMI and attach the S3 bucket to train the full dataset.
Show Answer & Explanation

Correct Answer: A. Load a smaller subset of the data into the SageMaker notebook and train locally. Con rm that the training code is executing and the model parameters seem reasonable. Initiate a SageMaker training job using the full dataset from the S3 bucket using Pipe input mode.

The specialist should load a small subset of data locally to validate training code and hyperparameters, then use Pipe input mode in a SageMaker training job to train on the full dataset from S3. Pipe mode streams data from S3 to the training instance on-the-fly, avoiding the need to download and load all data into memory or storage. This approach keeps the 5 GB EBS volume constraint and circumvents the hours-long download process. Pipe input mode is specifically designed for large datasets and distributed training scenarios where loading the entire dataset upfront is impractical. This two-phase approach (local validation + distributed training with Pipe mode) is the most efficient and practical solution. Why correct: Pipe input mode streams data from S3—doesn't require full download Local validation confirms code correctness before scaling Avoids exceeding 5 GB EBS volume limit Designed specifically for large datasets that don't fit on single instance Enables training on millions of data points without local storage constraints Why others are incorrect: B (EC2 Deep Learning AMI): Requires launching additional EC2 infrastructure; doesn't solve the storage problem; adds operational overhead C (Glue + Pipe mode): Glue is for ETL, not training validation; unnecessary step; Pipe mode is correct, but Glue preprocessing adds complexity D (Local validation + EC2 AMI): Still requires managing EC2 infrastructure; doesn't leverage SageMaker's distributed training; more complex than A

Question 10

A Machine Learning Specialist has completed a proof of concept for a company using a small data sample, and now the Specialist is ready to implement an end- to-end solution in AWS using Amazon SageMaker. The historical training data is stored in Amazon RDS. Which approach should the Specialist use for training a model using that data?

A. Write a direct connection to the SQL database within the notebook and pull data in
B. Push the data from Microsoft SQL Server to Amazon S3 using an AWS Data Pipeline and provide the S3 location within the notebook.
C. Move the data to Amazon DynamoDB and set up a connection to DynamoDB within the notebook to pull data in.
D. Move the data to Amazon ElastiCache using AWS DMS and set up a connection within the notebook to pull data in for fast access
Show Answer & Explanation

Correct Answer: B. Push the data from Microsoft SQL Server to Amazon S3 using an AWS Data Pipeline and provide the S3 location within the notebook.

The specialist should push data from Amazon RDS to Amazon S3 using AWS Data Pipeline, then provide the S3 location to the SageMaker notebook. While pulling data directly from RDS (Option A) is possible, the recommended best practice for SageMaker training is to stage data in S3 first. AWS Data Pipeline provides a simple, orchestrated way to move data from RDS to S3, ensuring data is in the optimal format and location for SageMaker training jobs. SageMaker is designed to work with S3 as the primary data source, providing better performance and cost efficiency than direct database connections. This approach also decouples the training process from the live database, preventing performance impact on production systems. Why correct: AWS Data Pipeline provides managed data movement from RDS to S3 S3 is the native/optimal data source for SageMaker Decouples training from production database Data Pipeline handles scheduling and orchestration Follows AWS best practices for SageMaker workflows Enables scalable, repeated training runs

Ready for the Full MLS-C01 Experience?

Access all 58 pages of practice questions, track your progress, and simulate the real exam with timed mode.

Start Interactive Quiz →