Real-Time Sign Language Detection Using Human Pose Estimation

ECCV 2020

This video is a presentation of the paper "Real-Time Sign Language Detection Using Human Pose Estimation" which is presented in SLRTP 2020 and in the ECCV 2020 Demo Track. [English captions available] Sign language users experience various problems when videoconferencing. One major problem is that current applications only detect speakers by voice activation, making it hard for signers to “get the floor”. Furthermore, in group calls, it is cognitively exhausting always to search around to see if someone starts signing. We present a lightweight, real-time, sign language detection app that connects to various videoconferencing applications and can set the user as the “speaker” when they sign. This app leverages fast pose estimation and sign language detection models running in the browser using tf.js, enabling it to work reliably and in real-time on the CPU. When the user is detected to be signing, we pass inaudible audio (20KHz) through a virtual microphone, which is then detected by any videoconferencing application as if the user is “speaking.” We believe videoconferencing applications should be accessible to everyone and hope this work makes a step in this direction, and that our app can empower signers to use whatever videoconferencing application they would like more conveniently. References: [1] Sign Language Detection “in the Wild” with Recurrent Neural Networks [2] Extending the Public DGS Corpus in Size and Depth [3] OpenPose: realtime multi-person 2D pose estimation using Part Affinity Fields [4] Towards accurate multi-person pose estimation in the wild