Cross-modal Language Generation using Pivot Stabilization for Web-scale Language Coverage