Improved Audio-Visual Speaker Recognition via the Use of a Hybrid Combination Strategy

Lucey Simon; Chen Tsuhan

DSpace Home
→
Ingenierías y Ciencias de la Computación
→
*Ingenierías y Ciencias de la Computación (Proyecto VLIR)
→
Documentos
→
View Item

dc.creator	Lucey Simon
dc.creator	Chen Tsuhan
dc.date	2003
dc.date.accessioned	2017-11-14T14:07:06Z
dc.date.available	2017-11-14T14:07:06Z
dc.identifier.uri	http://hdl.handle.net/123456789/3411
dc.description.abstract	In this paper an in depth analysis is undertaken into effective strategies for integrating the audiovisual modalities for the purposes of text-dependent speaker recognition. Our work is based around the well known hidden Markov model (HMM) classifier framework for modelling speech. A framework is proposed to handle the mismatch between train and test observation sets, so as to provide effective classifier combination performance between the acoustic and visual HMM classifiers. From this framework, it can be shown that strategies for combining independent classifiers, such as the weighted product or sum rules, naturally emerge depending on the influence of the mismatch. Based on the assumption that poor performance in most audiovisual speaker recognition applications can be attributed to train/test mismatches we propose that the main impetus of practical audiovisual integration is to dampen the independent errors, resulting from the mismatch, rather than trying to model any bimodal speech dependencies. To this end a strategy is recommended , based on theory and empirical evidence, using a hybrid between the weighted product and weighted sum rules in the presence of varying acoustic noise. Results are presented on the M2VTS database.
dc.format	application/pdf
dc.title	Improved Audio-Visual Speaker Recognition via the Use of a Hybrid Combination Strategy
dc.type	generic