A company uses Amazon SageMaker for its ML pipeline in a production environment. The company has large input data sizes up to 1 GB and processing times up to 1 hour. The company needs near real-time latency. Which SageMaker inference option meets these requirements?
Show Answer & Explanation
Correct Answer: C. Asynchronous inference
Asynchronous inference is designed for large payloads and long processing times while still providing near real-time responses. It can handle payloads up to 1 GB and processing times up to 15 minutes per request. Real-time inference has strict latency requirements and payload limits that would not accommodate 1 GB inputs. Serverless inference has payload and timeout limitations. Batch transform is for offline processing without real-time requirements.