Evaluation of speaker normalization methods for vowel recogntion using fuzzy ARTMAP and K NN

Author(s): Carpenter, G.A. | Govindarajan, K.K. |

Year: 1993

Citation: Proceedings of the World Congress on Neural Networks (WCNN 93), III 397-404

Abstract: A procedure that uses fuzzy ARTMAP and K-Nearest Neighbor (K NN) categorizers to evaluate intrinsic and extrinsic speaker normalization methods is described. Each classifier is trained on preprocessed, or normalized, vowel tokens from about 30% of the speakers of the Peterson-Barney database, then tested on data from the remaining speakers. Intrinsic normalization methods included one nonscaled, four psychophysical scales (bark, bark with end-correction, mel, ERB), and three log scales, each tested on four different combinations of the fundamental (F0) and the formants (F1, F2, F3). For each scale and frequency combination, four extrinsic speaker adaptation schemes were tested: centroid subtraction across al frequencies (CS), centroid subtraction for each frequency (CSi), linear scale (LS), and linear transformation (LT). A total of 32 intrinsic and 128 extrinsic methods were thus compared. Fuzzy ARTMAP and K NN showed similar trends with K NN performing somewhat better and fuzzy ARTMAP requiring about 1/10 as much memory. The optimal intrinsic normalization method was bark scale, or bark with end-correction, using the differences between all frequencies (Diff All). The order of performance for the extrinsic methods was LT, CSi, LS, and CS, with fuzzy ARTMAP performing best using barkscale with Diff All; and K NN choosing psychophysical measures for all except CSi.

Topics: Machine Learning, Speech and Hearing, Models: Fuzzy ARTMAP,

PDF download

Cross References