i'm student. i'm studying audio machine learning. using matlab few audio processing toolbox such as, mir toolbox. have read paper doing. when use database built in matlab train kaldi incorrect. (it's not sample phonemes string before).
i use mir toolbox find phonemes string of music file , save file .phn.
step doing : audio -> mfcc -> cluster (by frames using kmeans in matlab not som toolbox).
i try file yesno 0_1_0_1_0_0_0_0.wav (no yes no yes no no no no) cluster 3 have result:
and have phonemes string :
0.00-5.58 mp_2 mp_3 mp_1 mp_2 mp_3 mp_2 mp_3 mp_1 mp_2 mp_3 mp_1 mp_3 mp_2 mp_3 mp_1 mp_2 mp_3 mp_1 mp_2 mp_3 mp_1 mp_2
with mp1 = yes; mp2 = sil ; mp3 = no . incorrect no yes no yes no no no no. when use training in kaldi have error percent more much
therefore change find phonemes audio file segment.
step doing : audio -> mfcc -> simatrix -> novelty -> peaks -> segment -> cluster.
with phonemes :
0.00-5.58 mp_2 mp_3 mp_1 mp_3 mp_1 mp_3 mp_2
and incorrect phonemes string before. try train in kaldi run well.
and problem here are: read paper , know music have 1024 - 2244 phonemes. try increase number of cluster 8-16-32-1024.. it's false. train in kaldi error percent more. ~90%
i try change config mfcc not better.
could recommend me find phonemes string of music.
Comments
Post a Comment