The Amazon Alexa team announced on November 12 that they have migrated a majority of their GPU-based machine learning inference workloads to AWS Inferentia Application Specific Integrated Circuit (ASIC). This shift from Nvidia GPU hardware to Amazon's own Inferentia chip resulted in 30-percent lower cost and 25-percent improvement in end-to-end latency on Alexa's text-to-speech workloads.