Abstract Spectral envelopes of speech signals are typically obtained by making stationarity assumptions about the signal which are not always valid. The Adaptive Quasi-Harmonic Model (AQHM), a non-statio-nary signal model, is capable of capturing the time-varying quasi-harmonics in voiced speech. This paper suggests the use of AQHM in a multi-layer scheme which results in a high-resolution time-frequency representation of speech. This representation is then used for the recovery of the evolving spectral envelope and thus, a time-frequency spectral envelope estimation algorithm is introduced related to the Papoulis-Gerchberg algorithm for data extrapolation. Results on voiced speech sounds show that the estimated spectral envelopes are smoother than those estimated by state-of-the-art spectral envelope estimators, while maintaining the important spectral details of the speech spectrum.