1. What is Athena?

Amazon Athena is a serverless, interactive query service that analyzes data directly in S3 using standard SQL. No infrastructure — point at data and query.


Core Concept Athena = serverless SQL on S3. Data stays in S3. Define schema in Glue Data Catalog, run SQL. Pay per TB scanned (~$5/TB). No servers, no idle costs.


2. Key Characteristics

  1. Serverless, standard SQL (Presto/Trino engine)
  2. Queries S3 directly (no loading)
  3. Uses Glue Data Catalog for schemas
  4. Pay per TB scanned (~$5/TB)
  5. Supports CSV, JSON, Parquet, Avro, ORC
  6. Federated queries: RDS, DynamoDB, Redshift via Lambda connectors
  7. Integrates with CloudTrail, VPC Flow Logs, and ALB logs


3. Cost Optimization

Best Practice Convert to Parquet + partition + compress = up to 90% cost reduction. Use Glue ETL to convert CSV/JSON to Parquet.

4. Federated Queries

  1. Query non-S3 sources via Lambda connectors
  2. Sources: RDS, DynamoDB, Redshift, CloudWatch, JDBC databases
  3. Join S3 + RDS in one SQL query

5. Athena for AWS Logs

Common Athena + Logs:

CloudTrail: SELECT * FROM trail WHERE eventName = 'DeleteBucket'
Flow Logs: SELECT srcaddr, action FROM flow WHERE action = 'REJECT.'
ALB Logs: SELECT status_code, COUNT(*) FROM alb GROUP BY 1
Exam Tip Athena: "Serverless SQL on S3" = Athena. "Query CloudTrail logs" = Athena. "Pay per TB" = ~$5/TB. "Reduce cost" = Parquet + partitions. "Federated query" = Lambda connectors. Uses Glue Data Catalog.