Abstract Hidden Markov model (HMM) based statistical parametric speech synthesis (SPSS) has become a very popular method for text-to-speech conversion. The method gives good performance in terms of naturalness, and reproduces many of the acoustic characteristics of the human voice. Unfortunately, the method is also rather opaque, in that the acoustic relevance, or “meaning”, of each of the huge number of model parameters is far from obvious. Recent work has shown it is possible to integrate articulatory features within an HMM-based SPSS system, and to control the synthetic speech in terms of those articulatory features. Articulatory measurement data reflects the physical speech production mechanism, which offers an easily-understandable domain for controlling synthetic speech, but which is also relatively inconvenient to acquire. Formant features have a straightforward relationship to vocal tract configurations but, unlike articulatory data, can be easily obtained by estimating directly from recorded speech waveforms. The purpose of this paper is to investigate the integration of formant features into state-of-the-art HMM-based SPSS. By modeling the relationship between formant features and the spectral features of the synthesis vocoder, we aim to introduce control over the synthesized speech via formant features that are predicted in parallel during the synthesis process. We have conducted two categorical perception experiments to evaluate and analyse the controlability of this approach quantitatively. The results we present show that the synthetic speech may be very readily controlled in terms of simple formant features, and that prior phonetic knowledge of formants may be easily applied.