ica2016 312 saruwatari

Flexible Microphone Array Based on

Multichannel Nonnegative Matrix Factorization

and Statistical Signal Estimation

Hiroshi Saruwatari, Kazuma Takata

(The Unoversity of Tokyo, JAPAN)

Nobutaka Ono (NII, JAPAN),

Shoji Makino (University of Tsukuba, JAPAN)

Acoustic Array Systems: Paper ICA2016-312

Outline

Introduction of rescue robot audition

Conventional approaches (ICA, IVA, Rank-1

MNMF)

Informed source separation and its problem

Ego-noise basis mismatch problem solution

Speech ambiguity problem solution

Experimental evaluation

Conclusion

2

Introduction: Rescue Robot Audition

Aimed to detect victims’ speech in a disaster area.

Flexible body twists and moves driven by vibration motors.

It wears multiple microphones around the body.

• Thus, microphones’ position is always unknown.

• Self-Vibration generates harmful noise.

(so-called Ego-Noise)

One of the Distributed Microphone Array Problem 3

Microphone Vibrator

What is hose-shaped rescue robot?

4

Source Observation Separated

Mixing Separation

Conventional: ICA or Independent vector analysis (IVA), which

separates the sources based on their independence nature.

We assume

linear time-

invariance in A.

This is a simultaneous

estimation problem for

W and source

statistical models.

x=As y=Wx

Unknown Known

W

Demixing

matrix

How to solve? Use Blind Source Separation

Source model (p.d.f.s)

S1

S2

Speech

Ego-noise

Speech

Ego-

noise

+

Low-rank source spectrogram

5

Rank-1 MNMF (Independent Low-Rank Matrix Analysis)

that separates the sources by estimating demixing matrix W

and low-rank source spectrogram model via Nonnegative

Matrix Factorization (NMF) [Lee, 2001].

Rank-1 MNMF [Kitamura, Saruwatari et al., IEEE Trans. ASLP 2016]

W

Demixing

matrix Simultaneous

estimation for W and TV

+

In this study, we focus our attention to…

6

Rank-1 MNMF (Independent Low-Rank Matrix Analysis)

Pros & Cons:

• All parameters can be updated via Auxiliary-Function method

(EM-like algorithm), keeping nonnegative feature of T & V.

• The cost function always decreases in each iteration. Thus,

this is convergence-guaranteed algorithm unlike ICA!

• Still affected by initial state of parameters. go to “Informed”

Rank-1 MNMF’s cost function to be minimized

: Independence measure between sources (for W)

: Low-rank approximation of sources (for T and V)

(Note: both are based on Itakura-Saito (IS) divergence.)

Typical ego-noise

basis trained by

NMF in advance

Activation

Source model in Rank-1 MNMF

7

Basis

Toward Informed Source Separation

Typical ego-noise

basis trained by

NMF in advance

Activation


Fixing a part of bases, estimate

remaining parameters and W.

8

Basis Speech

basis

Ego-

noise

basis


(unknown)

(unknown)

Typical ego-noise

basis trained by

NMF in advance

Activation


Fixing a part of bases, estimate

remaining parameters and W.

[Problem 1] Ego-noise time-variance (ego-noise mismatch problem)

[Problem 2] Unknown speech (speech model ambiguity problem)

9

Basis Speech

basis

Ego-

noise

basis


(unknown)

(unknown)

Supervised Rank-1 MNMF

Rough separation

Statistical Postfilter [Breithaupt, 2010]

Chi distribution (sparse p.d.f.)

is used as target signal prior.

Its sparseness can be estimted

from data empirically via

higher-order statistics [Murota, Saruwatari, ICASSP2014].

Observed

signal

Thanks to sparse prior, we can

obtain more accurate separation

and its Certainty.

Statistical Signal Estimation

Certainty

Estimated ego-noise

Sparse p.d.f.

6

Estimated target signal

Statistical Signal Estimation

6

Certainty I ={1; if G(f,t)>0.8, otherwise 0}: binary mask that

extracts seldom overlapping components with the target signal

from the estimated interference signal.

12

Problem 1: Ego-Noise Mismatch Solution

We sample convincing ego-noise spectrogram by certainty I.

Next, obtain smoothed “time-frequency deformation function”

between sampled spectrogram and original supervised ego-

noise basis.

Time-invariant all-pole model is used as deformation function.

Diagonal matrix with entries

Supervised ego-noise basis

Ego-noise activation

KL divergence

Order of all-pole model

This can be solved as extended NMF optimization.

Frequency

Pow

er

spectr

um

13

Problem 1: Ego-Noise Mismatch Solution

: each element of

Update of activation

Update of all-pole-model weight

By noting the KL-cost function as J, its auxiliary function is given by

Statistical postfilter’s output is sparse estimation of S.

We can re-estimate sparse-aware speech basis using .

We use it as an initial value of speech basis in Rank-1 MNMF.

14

Problem 2: Speech Model Ambiguity Solution

Speech basis Speech activation

IS-divergence

Time

Fre

quency

Time

Fre

quency

Sparse low-rank speech spectrogram Output of Rank-1 MNMF

Sparse

Low-rank

approximation

Sparse speech spectrogram

実験条件

# of mic. : 8 channel microphones on 3-m-long hose-shape robot

Speech : male & female speech with real-recorded impulse responses

Ego-noise: real-recorded in moving hose-shaped robot (2 patterns)

Training : matched with mixed ego-noise (2 patterns) &

mismatched (3 patterns)

Evaluation: SDR improvement (both SNR and distortion are considered)

Input SDR: 0 dB, -5 dB, -10 dB

Comparison: IVA, PSNMF (single-channel supervised NMF),

Rank-1 MNMF (no supervision)

15

Simulation Experiment

True target Interference Artificial distortion Estimated

Higher SDR

indicates better

separation

16

Example of Typical SDR Improvement

Supervised

Rank-1

MNMF

Statistical

postfilter

(1)

(2)

(3)

(4)

Combination of each processing is effective.

SD

R Im

pro

vem

ent [d

B]

Step(1) Step(2) Step(3) Step(4)

SDR increases through

each processing step

Before basis defom.

and initialization

Basis

deform.

and

Initialization

After basis defom.

and initialization

17

Comparison with Competitors

Proposed methods of both matched and mismatched cases

outperform other conventional methods, whereas the

mismatched case is inferior to matched.

Conventional

Proposed

We proposed a new informed source separation

method for the flexible microphone array system based

on supervised Rank-1 MNMF and statistical speech

enhancement.

To reduce the mismatch problem, we proposed the

algorithm that an all-pole model is estimated to deform

the bases using the reliable spectral components

sampled by the statistical signal enhancement method.

We revealed that the proposed method outperforms the

conventional methods via experiments with actual

sounds in the rescue robot.

18

Conclusion

Thank you for your attention!

ica2016 312 saruwatari

Engineering