Canary Speech has published several requirements and recommendations when it comes to producing audio files containing voice. These are vital to ensuring that audio quality is to a sufficient standard so that vocal features are not lost due to sampling errors or compression, thus increasing the accuracy of the analysis as much as possible.
When uploading a recording, the raw PCM bytes must be prefixed by a wav header to allow Canary Speech to parse the audio encoding. In the HTTP request, a content-type header is required and must be set to any variation of the audio/wav MIME type.
| Minimum Configuration | |
|---|---|
| Codec |
• Uncompressed (WAV)
|
| Sample Rate | 16,000 samples per second |
| Bit Depth | 16 bits (2 bytes) per sample |
| Channel Count | 1 per speaker |
| Recommended Configuration | |
|---|---|
| Codec | Uncompressed (WAV) |
| Sample Rate | 48,000 samples per second |
| Bit Depth | 16 bits (2 bytes) per sample |
| Channel Count | 1 per speaker |