Canary Speech has published several requirements and recommendations when it comes to producing audio files containing voice. These are vital to ensuring that audio quality is to a sufficient standard so that vocal features are not lost due to sampling errors or compression, thus increasing the accuracy of the analysis as much as possible.
When uploading a recording, the raw PCM bytes must be prefixed by a wav header to allow Canary Speech to parse the audio encoding. In the HTTP request, a content-type header is required and must be set to any variation of the audio/wav MIME type.
Minimum Configuration | |
---|---|
Codec |
• Uncompressed (WAV)
|
Sample Rate | 16,000 samples per second |
Bit Depth | 16 bits (2 bytes) per sample |
Channel Count | 1 per speaker |
Recommended Configuration | |
---|---|
Codec | Uncompressed (WAV) |
Sample Rate | 48,000 samples per second |
Bit Depth | 16 bits (2 bytes) per sample |
Channel Count | 1 per speaker |