AWS DAS-C01 Free Practice Questions — Page 2

Data Analytics - Specialty • 5 questions • Answers & explanations included

Question 6

A manufacturing company uses Amazon Connect to manage its contact center and Salesforce to manage its customer relationship management (CRM) data. The data engineering team must build a pipeline to ingest data from the contact center and CRM system into a data lake that is built on Amazon S3.What is the MOST efficient way to collect data in the data lake with the LEAST operational overhead?

A. Use Amazon Kinesis Data Streams to ingest Amazon Connect data and Amazon AppFlow to ingest Salesforce data.
B. Use Amazon Kinesis Data Firehose to ingest Amazon Connect data and Amazon Kinesis Data Streams to ingest Salesforce data.
C. Use Amazon Kinesis Data Firehose to ingest Amazon Connect data and Amazon AppFlow to ingest Salesforce data.
D. Use Amazon AppFlow to ingest Amazon Connect data and Amazon Kinesis Data Firehose to ingest Salesforce data
Show Answer & Explanation

Correct Answer: C. Use Amazon Kinesis Data Firehose to ingest Amazon Connect data and Amazon AppFlow to ingest Salesforce data.

Why C is correct: Amazon Kinesis Data Firehose is purpose-built for streaming Amazon Connect data (call records, agent events) directly to S3 with automatic batching, compression, and transformation capabilities - all fully managed. Amazon AppFlow is specifically designed for SaaS application integration, including native Salesforce connectivity with pre-built connectors, automatic schema detection, and data transfer directly to S3. Both services are fully managed with minimal operational overhead. Why others are wrong: A Kinesis Data Streams requires additional consumers and code to write to S3, adding operational overhead B Kinesis Data Streams doesn't natively ingest Salesforce data D AppFlow doesn't directly support Amazon Connect; Firehose is the native integration

Question 7

A company has a data warehouse in Amazon Redshift that is approximately 500 TB in size. New data is imported every few hours and read-only queries are run throughout the day and evening. There is aparticularly heavy load with no writes for several hours each morning on business days. During those hours, some queries are queued and take a long time to execute. The company needs to optimize query execution and avoid any downtime.What is the MOST cost-effective solution?

A. Enable concurrency scaling in the workload management (WLM) queue.
B. Add more nodes using the AWS Management Console during peak hours. Set the distribution style to ALL.
C. Use elastic resize to quickly add nodes during peak times. Remove the nodes when they are not needed.
D. Use a snapshot, restore, and resize operation. Switch to the new target cluster.
Show Answer & Explanation

Correct Answer: A. Enable concurrency scaling in the workload management (WLM) queue.

Why A is correct: Concurrency scaling in Amazon Redshift automatically adds transient cluster capacity when queries are queued, specifically designed for read-heavy workloads with periodic spikes. It handles the morning query surge automatically without downtime, and you only pay for the additional capacity when it's actually used (per-second billing for concurrency scaling clusters). This directly addresses the queuing issue with minimal cost and zero downtime. Why others are wrong: B requires manual intervention during peak hours and doesn't avoid downtime during resize operations; distribution style ALL is also not cost-effective for 500 TB C elastic resize requires brief downtime (usually minutes) which violates the "avoid any downtime" requirement D snapshot, restore, and resize involves significant downtime during the switch

Question 8

A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as .csv files that are stored in Amazon S3. The company’s analysts are using Amazon Athena to perform SQL queries against a recent subset of the overall data. The amount of data that is ingested into Amazon S3 has increased substantially over time, and the query latency also has increased.Which solutions could the company implement to improve query performance? (Choose two.)

A. Use MySQL Workbench on an Amazon EC2 instance, and connect to Athena by using a JDBC or ODBC connector. Run the query from MySQL Workbench instead of Athena directly.
B. Use Athena to extract the data and store it in Apache Parquet format on a daily basis. Query the extracted data.
C. Run a daily AWS Glue ETL job to convert the data files to Apache Parquet and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data on a daily basis.
D. Run a daily AWS Glue ETL job to compress the data files by using the .gzip format. Query the compressed data.
E. Run a daily AWS Glue ETL job to compress the data files by using the .lzo format. Query the compressed data
Show Answer & Explanation

Correct Answers: B. Use Athena to extract the data and store it in Apache Parquet format on a daily basis. Query the extracted data.; C. Run a daily AWS Glue ETL job to convert the data files to Apache Parquet and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data on a daily basis.

Why B is correct: Apache Parquet is a columnar storage format that significantly improves Athena query performance through better compression and column pruning. By extracting and converting daily, you maintain updated data in an optimized format, reducing query latency dramatically compared to CSV files. Why C is correct: This solution combines two optimizations: converting to Parquet format (columnar, compressed) and partitioning the data (allowing partition pruning to scan less data). The AWS Glue crawler automatically discovers new partitions daily, keeping the catalog updated. This addresses both the format inefficiency and the growing data volume through partitioning, providing the best long-term query performance improvement. Why others are wrong: A using MySQL Workbench doesn't improve underlying query performance; it's just a different client D & E while compression helps, gzip and lzo don't provide the same performance benefits as columnar Parquet format, and they don't address the growing data volume through partitioning

Question 9

A company has a marketing department and a finance department. The departments are storing data in Amazon S3 in their own AWS accounts in AWS Organizations. Both departments use AWS Lake Formation to catalog and secure their data. The departments have some databases and tables that share common names.The marketing department needs to securely access some tables from the finance department.Which two steps are required for this process? (Choose two.)

A. The finance department grants Lake Formation permissions for the tables to the external account for the marketing department.
B. The finance department creates cross-account IAM permissions to the table for the marketing department role.
C. The marketing department creates an IAM role that has permissions to the Lake Formation tables.
Show Answer & Explanation

Correct Answers: A. The finance department grants Lake Formation permissions for the tables to the external account for the marketing department.; C. The marketing department creates an IAM role that has permissions to the Lake Formation tables.

Why A is correct: Lake Formation uses a resource sharing model where the data owner (finance department) must explicitly grant permissions on specific tables to external AWS accounts. This Lake Formation grant is essential for cross-account data access in a Lake Formation-managed environment. Why C is correct: The marketing department needs an IAM role with appropriate permissions to assume and access the shared Lake Formation tables. This role acts as the identity that will access the finance department's data through Lake Formation's permission model. Why B is wrong: While IAM permissions are needed, Lake Formation uses its own permission model for table-level access control, not direct cross-account IAM table permissions. The proper approach is Lake Formation grants plus IAM roles, not pure IAM permissions to tables.

Question 10

A company developed a new elections reporting website that uses Amazon Kinesis Data Firehose to deliver full logs from AWS WAF to an Amazon S3 bucket. The company is now seeking a low-cost option to perform this infrequent data analysis with visualizations of logs in a way that requires minimal development effort.Which solution meets these requirements?

A. Use an AWS Glue crawler to create and update a table in the Glue data catalog from the logs. Use Athena to perform ad-hoc analyses and use Amazon QuickSight to develop data visualizations.
B. Create a second Kinesis Data Firehose delivery stream to deliver the log files to Amazon OpenSearch Service (Amazon Elasticsearch Service). Use Amazon ES to perform text-based searches of the logs for ad-hoc analyses and use OpenSearch Dashboards (Kibana) for data visualizations.
C. Create an AWS Lambda function to convert the logs into .csv format. Then add the function to the Kinesis Data Firehose transformation configuration. Use Amazon Redshift to perform ad-hoc analyses of the logs using SQL queries and use Amazon QuickSight to develop data visualizations.
D. Create an Amazon EMR cluster and use Amazon S3 as the data source. Create an Apache Spark job to perform ad-hoc analyses and use Amazon QuickSight to develop data visualizations.
Show Answer & Explanation

Correct Answer: A. Use an AWS Glue crawler to create and update a table in the Glue data catalog from the logs. Use Athena to perform ad-hoc analyses and use Amazon QuickSight to develop data visualizations.

Why A is correct: This solution leverages AWS serverless services for minimal operational overhead. AWS Glue crawler automatically discovers the schema of WAF logs in S3 and creates/updates tables in the Glue Data Catalog. Athena provides ad-hoc SQL querying against S3 data with pay-per-query pricing (low cost for infrequent analysis). QuickSight connects directly to Athena for visualization with minimal development - just point and click dashboard creation. This is the most cost-effective and low-effort solution. Why others are wrong: B requires managing an OpenSearch cluster which adds operational overhead and continuous costs even when not querying C requires Lambda development for CSV conversion and managing a Redshift cluster (high operational overhead and cost) D requires managing an EMR cluster and developing Spark jobs, adding significant operational complexity and cost

Ready for the Full DAS-C01 Experience?

Access all 33 pages of practice questions, track your progress, and simulate the real exam with timed mode.

Start Interactive Quiz →