High network communication cost for synchronizing weights and,gradients in geo-distributed data analysis consumes the benefits of,advancement in computation and optimization techniques. Many,quantization methods for weight, gradient or both have been proposed in recent years where weight-quantized model suffers from,error related to weight dimension and gradient-quantized method,suffers from slow convergence rate by a factor related to the gradient quantization resolution and gradient dimension. All these,methods have been proved to be infeasible in terms of distributed,training across multiple data centers all over the world. Moreover,recent studies show that communicating over WANs can significantly degrade DNN model performance by upto 53.7,×,because,of unstable and limited WAN bandwidth. Our goal in this work,is to design a geo-distributed Deep-Learning system that (1) ensures efficient and faster communication over LAN and WAN and,(2) maintain accuracy and convergence for complex DNNs with,billions of parameters. In this paper, we introduce WeightGrad,which acknowledges the limitations of quantization and provides,loss-aware weight-quantized networks with quantized gradients,for local convergence and for global convergence it dynamically,eliminates insignificant communication between data centers while,still guaranteeing the correctness of DNN models. Our experiments,on our developed prototypes of WeightGrad running across 3 Amazon EC2 global regions and on a cluster that emulates EC2 WAN,bandwidth show that WeightGrad provides 1.06% gain in top-1,accuracy, 5.36,×,speedup over baseline and 1.4,×,-2.26,×,over the four,state-of-the-art distributed ML systems.