Voice-driven services (VDS) are being used in a variety of applications
ranging from smart home control to payments using digital assistants. The input
to such services is often captured via an open voice channel, e.g., using a
microphone, in an unsupervised setting. One of the key operational security
requirements in such setting is the freshness of the input speech. We present
AEOLUS, a security overlay that proactively embeds a dynamic acoustic nonce at
the time of user interaction, and detects the presence of the embedded nonce in
the recorded speech to ensure freshness. We demonstrate that acoustic nonce can
(i) be reliably embedded and retrieved, and (ii) be non-disruptive (and even
imperceptible) to a VDS user. Optimal parameters (acoustic nonce’s operating
frequency, amplitude, and bitrate) are determined for (i) and (ii) from a
practical perspective. Experimental results show that AEOLUS yields 0.5% FRR at
0% FAR for speech re-use prevention upto a distance of 4 meters in three
real-world environments with different background noise levels. We also conduct
a user study with 120 participants, which shows that the acoustic nonce does
not degrade overall user experience for 94.16% of speech samples, on average,
in these environments. AEOLUS can therefore be used in practice to prevent
speech re-use and ensure the freshness of speech input.

Author Of this post: Yangyong Zhang, Maliheh Shirvanian, Sunpreet S. Arora, Jianwei Huang, Guofei Gu

