Neural network and nearest neighbor comparison of speaker normalization methods for vowel recognition

Author(s): Carpenter, G.A. | Govindarajan, K.K. |

Year: 1993

Citation: In S. Gielen & B. Kappen (Eds.), Proceedings of the International Conference on Artificial Neural Networks (ICANN 93), London, UK: Springer Verlag, 412-415.

Abstract: Fuzzy ARTMAP and K-Nearest Neighbor (K-NN_ categorizers were used to evaluate intrinsic and extrinsic normalization methods by training and testing on disjoint sets of speakers of the Peterson-Barney database. Intrinsic methods included one nonscaled, four psychophysical scales (bark, bark with end-correction, mel, ERB), and three log scales, each tested on four different combinations of the frequencies F0, F1, F2, F3. Four extrinsic schemes were tested in conjunction with the intrinsic methods: centroid subtractions across al frequencies (CS), centroid subtraction for each frequency (CSi), linear scale (LS), and linear transformation (LT). Categorizers showed similar trends, with K-NN performing better but requiring more storage. The optimal intrinsic method was bark scale, or bark with end-correction, using differences between all frequencies (BDA). The order of performance for extrinsic methods was LT, CSi, LS, and CS, with ARTMAP performing best using BDA; and K-NN choosing psychophysical measures for all except CSi.

Topics: Machine Learning, Speech and Hearing, Models: Fuzzy ARTMAP,

PDF download

Cross References