Improving Transformer Optimization Through Better Initialization