So I've been working on a project to re-implement the VOCALOID1 engine.
I'm basing it on the description in Jordi Bonada's PhD thesis "Voice
Processing and Synthesis by Performance Sampling and Spectral Models"
and not the original papers as the former is more detailed, easier to
follow, and also describes the VOCALOID2 engine.
After a lot of trouble with getting TWM f0 estimation to work, I've
finally gotten to implementing MFPA. And amazingly, it seems to have
worked first try.
Compare my results:
https://i.ibb.co/dsvgv0fd/Screen-Shot-2026-03-02-at-3-54-48-PM.png
To the results in the study:
https://i.ibb.co/C3fjdWVd/Screen-Shot-2026-03-02-at-3-55-09-PM.png