Speaker normalization using cortical strip maps: A neural model for steady state vowel categorization

Author(s): Ames, H. | Grossberg, S. |

Year: 2008

Citation: The Journal of the Acoustical Society of America 2008;124(6):3918-36.

Abstract: Auditory signals of speech are speaker dependent, but representations of language meaning are speaker independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by adaptive resonance theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [Peterson, G. E., and Barney, H.L., J. Acoust. Soc. Am. 24, 175-184 (1952).] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.

Topics: Speech and Hearing, Applications: Human-Machine Interface, Models: Fuzzy ARTMAP,

PDF download

Cross References