Discussion:
Saving Retained Audio
(too old to reply)
Daniel Bigham
2009-06-24 17:43:01 UTC
Permalink
I am attempting to save the retained audio of a recognition event to a WAV
file so that I can do "pitch detection" on that WAV file to try and figure
out whether the speaker is male or female.

Unfortunately, when I listen to the retained audio, it sounds like a
chipmunk... why is this? Has the retained audio been massaged / normalized?

Sample code:

SpMemoryStream memStream = result.Audio(0, -1);
SpFileStream fs = new SpFileStream();
fs.Open("C:\\test.wav", SpeechStreamFileMode.SSFMCreateForWrite, true);

object data;

while (true)
{
int numBytes = memStream.Read(out data, 128 * 1024);

if (numBytes == 0)
{
break;
}

fs.Write(data);
}

fs.Close();
Robert Carnegie
2009-07-22 13:37:46 UTC
Permalink
On Jun 24, 6:43 pm, Daniel Bigham
Post by Daniel Bigham
I am attempting to save the retained audio of a recognition event to a WAV
file so that I can do "pitch detection" on that WAV file to try and figure
out whether the speaker is male or female.
Unfortunately, when I listen to the retained audio, it sounds like a
chipmunk... why is this? Has the retained audio been massaged / normalized?
Maybe playing at the wrong speed? You could still detect a pitch
difference, perhaps. However, it's unlikely to be reliable in
distinguishing men from women, except in your own family.

I'm afraid I'm at the level of not understanding why I have a modern
computer audio system and good microphone and I ask the speech system
to replay the word that it didn't unoderstand from me and it sounds
like really bad FM radio with the battery running out, so even I can't
tell what I said. Maybe that's the intention? To persuade me to
forgive poor recognition performance because apparently the audio
configuration is somehow deficient?
Daniel Bigham
2009-08-27 12:52:04 UTC
Permalink
I'm only intending to use the system within a single family, so hopefully the
pitch difference between my voice and my wife's voice would be different
enough !

I'm a little disappointed that Microsoft hasn't replied to this question...
it seems like it should have a straight forward answer.
On Jun 24, 6:43 pm, Daniel Bigham
Post by Daniel Bigham
I am attempting to save the retained audio of a recognition event to a WAV
file so that I can do "pitch detection" on that WAV file to try and figure
out whether the speaker is male or female.
Unfortunately, when I listen to the retained audio, it sounds like a
chipmunk... why is this? Has the retained audio been massaged / normalized?
Maybe playing at the wrong speed? You could still detect a pitch
difference, perhaps. However, it's unlikely to be reliable in
distinguishing men from women, except in your own family.
I'm afraid I'm at the level of not understanding why I have a modern
computer audio system and good microphone and I ask the speech system
to replay the word that it didn't unoderstand from me and it sounds
like really bad FM radio with the battery running out, so even I can't
tell what I said. Maybe that's the intention? To persuade me to
forgive poor recognition performance because apparently the audio
configuration is somehow deficient?
Continue reading on narkive:
Loading...