[ Speech2Spikes ]
S2S / 1001
Sequential stages of data processing (16 KHz 16-bit raw audio, Sliding Discrete Fourier Transform (SDFT), Log-Mel features, Step-Forward spike encoding)
Neuromorphic processors mimic the neural structure and parallel processing capabilities of the human brain, enabling more efficient computing for pattern recognition, computer vision, and other AI applications. Despite the maturity and availability of speech recognition systems, speech recognition has yet to be widely deployed onto neuromorphic systems. This is due to the sparse, spiking nature of these chips being fundamentally different from the continuous, high-resolution form of raw audio.
To translate between the two, we developed Speech2Spikes, an efficient audio processing pipeline that encodes recorded audio into spikes and is suitable for real-time operation with low-power neuromorphic processors. Speech2Spikes is made up of several sequential transformations, each extracting specific pieces of information from the underlying signal. These transformations are not only quite simple, but are readily accelerated by hardware and are capable of being set up as a filter for sample by sample processing.
Published in NICE ’23 (https://doi.org/10.1145/3584954.3584995)