Showcase

LISTA Showcase

Further information | Credits


This page contains sound examples which demonstrate techniques for intelligibility-enhancement developed both within and outside the EU-funded Listening Talker (LISTA) project.

The table below lists the modifications, distinguishing baseline conditions, LISTA and external contributions. For each modification, a short description is provided, distinguishing whether it operates on synthetic vs. natural speech and whether the process is noise dependent. Sound examples for 10 utterances are available for two masker types, a stationary speech-shaped noise (SSN) and a fluctuating competing speaker (CS) masker, at three signal-to-noise ratios. The top row of buttons play the speech + noise mixture, while the modified speech can be played in isolation from the bottom row. Speech-only signals are reequalised to a fixed RMS.

Most of these techniques have been evaluated in two large-scale evaluations [1, 2].

Select an utterance:
The birch canoe slid on the smooth planks
 Modification  Approach TTS Noise
Dep.
Mixed with SSN Mixed with CS Ref.
Plain Neutral speech S+N
S
S+N
S
Lombard Lombard speech S+N
S
S+N
S
TTS HMM-based text-to-speech S+N
S
S+N
S
OptGP Glimpse-optimised spectral reweighting S+N
S
S+N
S
3
phoneGain Speech-model-based phone energy scaling S+N
S
S+N
S
4
SelBoost Boost just audible regions S+N
S
S+N
S
5
spectralShift Speech-model-based band energy equalization S+N
S
S+N
S
4
SSDRC Spectral shaping + DRC S+N
S
S+N
S
6
TMDRC Harmonic model tilt modification
+ DRC TTS HMM-based text-to-speech
S+N
S
S+N
S
7
TTSGP Glimpse-optimised TTS S+N
S
S+N
S
8, 9
TTSLomb TTS adapted to Lombard S+N
S
S+N
S
10
F0-Shift Optimise energetic masking by shifting F0 S+N
S
S+N
S
11
GCRetime Modify local speech rate + preserve information S+N
S
S+N
S
12
PSSDRC-syn HMM synthesis + noise-independent modifications
at vocoder level
S+N
S
S+N
S
2
phoneLLabso ASR based phone energy equalisation S+N
S
S+N
S
13
phoneLLdscr ASR based contextual phone energy equalisation S+N
S
S+N
S
13
TTSLGP-DRC Lombard adaptation,
spectral envelope optimisation + DRC
S+N
S
S+N
S
9
uwSSDRCt SSDRC + time-scaling, vowel space expansion
and transients enhancement
S+N
S
S+N
S
14
OptSII SII-optimised spectral reweighting S+N
S
S+N
S
15
AdaptDRC SII-based adaptive DRC S+N
S
S+N
S
2
C2H-TTS Phonetic contrast maximisation S+N
S
S+N
S
2
GlottLombard Lombard adaptation using glottal inverse filtering
+ DRC and formant sharpening
S+N
S
S+N
S
2
IWFEMD Modification of intrinsic mode function
of empirical mode decomposition
S+N
S
S+N
S
2
on/offset Temporal energy reallocation to consonant bursts
and vocalic onsets
S+N
S
S+N
S
2
OptimalSII SII-optimised time-invariant spectral reallocation S+N
S
S+N
S
2
RESSYSMOD Loudness increase based on
source-filter modelling features
S+N
S
S+N
S
2
SBM Spectral binary mask application S+N
S
S+N
S
2
SEO Spectral energy optimisation by emphasising
perceptually motivated acoustic features
S+N
S
S+N
S
2
SINCoFETS Local time-scaling + DRC +
psychoacoustic adaptive equalisation
S+N
S
S+N
S
2
SSS Temporal energy reallocation based on
steady-state suppression
S+N
S
S+N
S
2

Further information

A special issue of Computer Speech and Language is under edition which assembles some of the modifications presented in the table (A link will be posted here when available).

See also the Listening Talker workshop (Edinburgh, 2-3 May 2012) programme which provided inspiration for the development of the current list.

Credits

The LISTA project was funded from the Future and Emerging Technologies programme within the 7th Framework Programme for Research of the European Commission, FET-Open grant number 256230.

References

  1. Cooke, M., Mayo, C., Valentini-Botinhao, C., Stylianou, Y., Sauert, B., & Tang, Y. (2013). Evaluating the intelligibility benefit of speech modifications in known noise conditions. Speech Comm., 55, 572-585. link
  2. Cooke, M., Mayo, C., & Valentini-Botinhao, C. (submitted). Intelligibility-enhancing speech modifications: the Hurricane Challenge. Interspeech, Lyon, France. pdf
  3. Tang, Y., & Cooke, M. (2012). Optimised spectral weightings for noise-dependent speech intelligibility enhancement. Proc. Interspeech, Portland, USA, 955-958. pdf
  4. Petkov, P. N., Kleijn, W. B., & Henter, G. E. (2012). Enhancing subjective speech intelligibility using a statistical model of speech. Proc. Interspeech, Portland, USA, 166-169. link
  5. Tang, Y., & Cooke, M. (2010). Energy reallocation strategies for speech enhancement in known noise conditions. Proc. Interspeech, Tokyo, Japan, 1636-1639. pdf
  6. Zorila, T.-C., Kandia, V., & Stylianou, Y. (2012). Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression. Proc. Interspeech, Portland, USA, 635-638. link
  7. Erro, D., Stylianou, Y., Navas, E., & Hernaez, I. (2012). Implementation of Simple Spectral Techniques to Enhance the Intelligibility of Speech using a Harmonic Model. Proc. Interspeech, Portland, USA, 639-642. link
  8. Valentini-Botinhao, C., Maia, R., Yamagishi, J., King, S., & Zen, H. (2012). Cepstral analysis based on the Glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise. Proc. ICASSP, Kyoto, Japan, 3997-4000. link
  9. Valentini-Botinhao, C., Yamagishi, J., & King, S. (2012). Mel cepstral coefficient modification based on the Glimpse Proportion measure for improving the intelligibility of HMM-generated synthetic speech in noise. Proc. Interspeech, Portland, USA, 631-634. link
  10. Valentini-Botinhao, C., Yamagishi, J., & King, S. (2012). Evaluating speech intelligibility enhancement for HMM-based synthetic speech in noise. Proc. SAPA Workshop, Portland, USA. pdf
  11. Villegas, J., & Cooke, M. (2012). Maximising objective speech intelligibility by local f0 modulation. Proc. Interspeech, Portland, US, 1704-1707. link
  12. Aubanel, V., & Cooke, M. (submitted). Information-preserving temporal reallocation of speech in the presence of fluctuating maskers. Interspeech, Lyon, France. pdf
  13. Petkov, P. N., Henter, G. E., & Kleijn, W. B. (2013). Maximizing phoneme recognition accuracy for enhanced speech intelligibility in noise. IEEE T. Audio Speech, 21(5), 1035-1045. link
  14. Godoy, E., & Stylianou, Y. (submitted). Increasing speech intelligibility via spectral shaping with frequency warping and dynamic range compression plus transient enhancement. Interspeech, Lyon, France. pdf
  15. Sauert, B., & Vary, P. (2011). Near end listening enhancement considering thermal limit of mobile phone loudspeakers. Proc. Conf. on Elektronische Sprachsignalverarbeitung (ESSV), Aachen, Germany, 333-340. pdf

Last update:  22th May 2013 by Vincent Aubanel