Combining Independent Component Analysis and Sound Stream Segregation


This paper reports the issues and results of AI Challenge: ``Understanding Three Simultaneous Speeches''. First, the issues of the Challenge are revisited. We emphasis the importance of information fusion of various attributes of speeches (sounds) in separating speeches from a mixture of sounds. This emphasis is supported by comparing two methods of speech separation; computational auditory scene analysis approach that employs the attributes of sound sources and sound transmitting channel, and blind source separation approach that dispenses with these attributes. Although these two approaches are usually considered as opposite with regards to whether sound attributes is used or not, we conclude that they differ in the ways of using sound attributes. Next, a new algorithm for information fusion is proposed. Sound attributes extracted by tracking harmonic structures and sound source directions as well as by independent component analysis are fused according to sound ontology. Finally, the error reduction rate of the 1-best/10-best word recognition of each speaker performed on 200 mixtures of two women’s and one man’s utterances of an isolated word is reported.

Proceedings of IJCAI-99 Workshop on Computational Auditory Scene Analysis