1. What is Athena?
Amazon Athena is a serverless, interactive query service that analyzes data directly in S3 using standard SQL. No infrastructure — point at data and query.
Core Concept Athena = serverless SQL on S3. Data stays in S3. Define schema in Glue Data Catalog, run SQL. Pay per TB scanned (~$5/TB). No servers, no idle costs.
2. Key Characteristics
- Serverless, standard SQL (Presto/Trino engine)
- Queries S3 directly (no loading)
- Uses Glue Data Catalog for schemas
- Pay per TB scanned (~$5/TB)
- Supports CSV, JSON, Parquet, Avro, ORC
- Federated queries: RDS, DynamoDB, Redshift via Lambda connectors
- Integrates with CloudTrail, VPC Flow Logs, and ALB logs
3. Cost Optimization

Best Practice Convert to Parquet + partition + compress = up to 90% cost reduction. Use Glue ETL to convert CSV/JSON to Parquet.
4. Federated Queries
- Query non-S3 sources via Lambda connectors
- Sources: RDS, DynamoDB, Redshift, CloudWatch, JDBC databases
- Join S3 + RDS in one SQL query
5. Athena for AWS Logs
Common Athena + Logs: CloudTrail: SELECT * FROM trail WHERE eventName = 'DeleteBucket' Flow Logs: SELECT srcaddr, action FROM flow WHERE action = 'REJECT.' ALB Logs: SELECT status_code, COUNT(*) FROM alb GROUP BY 1
Exam Tip Athena: "Serverless SQL on S3" = Athena. "Query CloudTrail logs" = Athena. "Pay per TB" = ~$5/TB. "Reduce cost" = Parquet + partitions. "Federated query" = Lambda connectors. Uses Glue Data Catalog.