I’m not really up-to-date on voice synthesis. Have we reached the point where we can get enough training data from just a handful of voice actors to train a model of this quality?
Or is this a case of them using those voice actors for fine-tuning a pretrained model and just being quiet about that?