Improving Multimodal Named Entity Recognition via Entity Span Detection with Unified Multimodal Transformer