Sunday, March 31, 2013

More About High Sample Rates And Better Than CD Sound Quality


You may have read my November 2012 post about sound quality. I discussed ultrasound, high sample rates and expensive speakers that can reproduce ultrasound (AKA ultrasonics, ultrasonic energy). I cited various facts about music production and human hearing to make my point that, while not exactly a scam, speakers with ultrasonic treble extension aren't a good investment. Although there is evidence that humans can perceive some ultrasound there isn't any evidence that it plays a role in enjoying recorded music.

This past week someone on one of my Linux audio mailing lists (or was it an Ardour list?) linked to a fantastic article at xiph.org that goes into considerable depth on related issues, especially the high sample rate/high resolution digital audio questions. I learned a lot from this article. The author also explains certain things I already knew in the very clearest of terms. I want to comment on some of these things briefly, tip my hat to him, and digest a few of his points for those who don't have time to read his full article.

The author, who goes by Monty, makes a very convincing argument for CD quality playback (16bit/44.1kHz). According to him (and I now agree with him) 16/44.1 is the way to go for digital playback. When it comes to listening to recorded music there is no need to employ more demanding digital formats.

The article is called 24/192 MusicDownloads ...And Why They Make No Sense.[1]  The article is long (by WWW standards) and covers a lot more than the title implies. Monty delves into the methodology of audiological research and ABX listening tests, among other things. He does this to ground all of his conclusions in science, not pseudo-science or magic. It's a long read so the impatient will want to take my word that A) he knows what he's talking about and B) there are a lot of claims being made about digital audio (by vendors, enthusiasts and even professionals) that are not grounded in science.

Here are a few highlights from Monty's piece, curated (and sometimes amplified) by me.

  1. Humans can not hear the difference between CDs (or CD quality files) and more expensive or resource intensive formats like SACD, DVD-A and “better than CD” digital downloads. Do not pay for them unless you have another reason, like bonus content or an improved master recording.

    For some time I believed that people could hear the difference between 16bit and 24bit, although I never had myself. My ears were right and I was wrong[2]. Just like astronomical sample rates, the science says we can't hear it.

    It turns out that the folks at Sony and Phillips made a very good choice when they chose 16/44.1 as the specs for Compact Disk audio. It captures the full pitch range of human hearing and supports the broadest practical dynamic range. Everything we can hear it captures and reproduces. What we can't it does not waste precious bits storing.

  2. Digital recordings are not quantized when played back. This is counter-intuitive until confronted with what a DAC actually does. The playback device's digital-to-analog converter connects the dots of the digital recording, dots that are extremely close together. In doing so it creates an analog wave that is smooth and continuous.[3] If the recording is 16/44.1 or better and the DAC is of good quality nothing audible that went into the recording is missing from the resulting analog wave. The image of jagged, blocky, brittle sound from CDs is wrong. Some DACs are poor. That's not is not a failing of 16/44.1 or the Compact Disk. Pumping more data through a crappy DAC probably won't help.

  1. 16bit/44.1Khz is good enough for listening/playback but production is different. There are practical advantages to making initial recordings in 24bit, mixing at 24bit and to higher sample rates for various types of processing. This is because the extra data can protect against certain problems, not because the recording engineer can hear the extra data.[4]

  1. Lossy formats like MP3, AAC and Ogg can sound very good, right up to being indistinguishable from the uncompressed original. To preserve the music in this way the lossy encoding must be done only once, with reasonable settings and quality software. Older encoding software (or new software that uses old encoding methods) may harm the sound. It is never a good idea to convert a lossy recording to a different lossy format. If you need a different format or file size go back to an uncompressed or otherwise lossless copy for your source.

Monty says way more than this, as do the sources he cites. Thank you, Monty, for making the world a little safer for those of us who value pragmatism and reality.


[1] Don't be thrown by the fact that the title and URL look unrelated.  The title really is "24/192 MusicDownloads ...And Why They Make No Sense" and the URL really does end with the file name "neil-young.html."  You will understand why as soon as you start reading the article.

[2] I believed this because I misunderstood something I read in an interview with Roger Nichols.  He said something along the lines of "there is never any reason to record in 16bit" and I took it to mean 24bit should be preserved all the way through to the consumer.  In retrospect I suspect he was only talking about production, or maybe even just recording.   

[3] Imagine a trombone sliding between notes. The sound does not stop or jump unless the musician tells the horn to do so. The analog side of the DAC does not stop between samples.

[4] Sometimes this takes the form of headroom or some other kind of virtual padding. At other times it gives processing systems more data to work with, providing a more detailed result, even though no one could hear the extra data/detail itself before or after processing. This is an unusual circumstance when things we can not hear are, in fact, useful.  One may hear the superior results of the data-rich processing, but not the additional data itself.  For example, a 24 bit and 16 bit copy of the processed output would sound the same, but output using less data for the processing would sound different.

No comments: