A new optical technique by which audio information can be extracted from high-speed video recordings, by using an image-matching process based on vibration from sound waves.
The technique is based on the fact that sound waves are mechanical waves that cause air to vibrate when traveling, the paper notes. That vibration through air can cause vibration of objects located in its traveling path, especially if the objects are lightweight, thin, and flexible, such as a piece of paper. The vibrations, although usually with small amplitudes, can be detected and analyzed algorithmically, and audio reconstructed based on those calculations.
The authors used a subset-based image-correlation approach to detect the motions of points on the surface of an object, capturing target images with a high-speed camera and applying the Gauss-Newton algorithm and a few other measures to achieve very fast and highly accurate image matching. Because the detected vibrations are directly related to sound waves, a simple model was used to reconstruct the original audio information of the sound waves.
While other recent work in the area reports on more sophisticated techniques to compute motion signals, the authors chose a simpler image-matching approach to measure vibration. Because light can travel through air considerably farther than sound and can pass through glass, they anticipate that the technique may find applications such as the passive detection of conversations inside of a building from a far distance. "We are currently improving the technique to increase its accuracy and sensitivity, make the measurements in real-time, and remove interference from other sources," said Zhaoyang Wang of the Department of Engineering in the Catholic University of America
"One of the intriguing aspects of the paper is the ability to recover spoken words from a video of objects in the room," said journal Associate Editor Reiner Eschbach, a Research Fellow at Xerox Corp. "The paper shows that the sound creates minute vibrations in objects and that these vibrations ― given the right equipment ― can be picked up from a video signal. This is an interesting foray into a new application space and will, in my view, trigger interesting research in the field,"