----------------------------> Model Architecture <-----------------------


Figure 1. Schematic diagram of the proposed framework JES-StarGAN. Blue boxes represent the modules involved in the trainingand the yellow boxes represent the pre-trained modules.

-----------------------------> Speech Samples <---------------------------

Experimental Setup:

Baseline: StarGAN-VC[1]
Proposed Method: JES-StarGAN, a joint emotional style and speaker identity conversion framework.

The samples are from four speakers ( two male and two female) with three emotions (neutral, happy, and sad).


Source StarGAN-VC JES-StarGAN Target
Neutral
Happy
Sad
[1] Kameoka, Hirokazu, et al. "Stargan-vc: Non-parallel many-to-many voice conversion using star generative adversarial networks." 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2018.