Canary Speech

Recording Guidelines

Audio Format

Canary Speech has published several requirements and recommendations when it comes to producing audio files containing voice. These are vital to ensuring that audio quality is to a sufficient standard so that vocal features are not lost due to sampling errors or compression, thus increasing the accuracy of the analysis as much as possible.

Minimum Configuration
Codec
  • • Uncompressed (PCM, WAV)
Sample Rate 16,000 samples per second
Bit Depth 16 bits (2 bytes) per sample
Channel Count 1 per speaker

Recommended Configuration
Codec Uncompressed (PCM, WAV)
Sample Rate 48,000 samples per second
Bit Depth 16 bits (2 bytes) per sample
Channel Count 1 per speaker

Naming Convention

In addition to the format of the file itself, Canary Speech also publishes a recommended convention for the file names. While this is not required, it does help reduce the likelihood of filename collisions as well as aid in debugging efforts should manual review ever become necessary.

  • Format: <Subject-Name>_<Timestamp>_<UUID>.wav
  • • Subject-Name: The name of the subject from which the audio was recorded.
  • • Timestamp: The URL-friendly ISO-8601 timestamp of when the audio was recorded. (UTC timestamps are recommended as they require no modification to be URL-friendly.)
  • • UUID: A randomly generated UUIDv4 code.

Example:

PN12345_2022-01-15T13:30:00Z_15243c72-f23c-11ec-b939-0242ac120002.wav

Further Reading