Experimenting With Spatial Audio For Audiobooks

Pt. 1 - Recording Narration

For my senior project, I am experimenting with spatial audio and how that technology can be transferred into the world of audiobooks. Will audiobooks benefit from a surround sound environment, or will they become too distracting, and take away from the story? Or, will the audio design make the story more immersive and enjoyable? I decided to take my love of science fiction as inspiration for this project to capture H. G. Well’s War of the Worlds in what I am calling a spatial audiobook.

I first needed to find a narrator who would be willing to take the time to help me record my arrangement of the story. This in itself is already a challenge because not everyone is capable of reading and speaking clearly for extended periods of time, especially when there is a live microphone capturing all of their mistakes. Over the course of a week, I put together an audition packet for potential narrators to read and audition with. I wasn’t necessarily listening for the perfect tone or radio voice, rather I was listening for how each of them read the story and how they portrayed emotion for the characters and situations in the book. I ultimately cast a good friend of mine, Liam Knoll, who happened to be interested in adding voice work to his resume.

While Liam was familiar with voice production, he had not taken on a role like this before. In order to prepare him as much as I could, I asked him to read over my version of the book and to listen to the official audiobook on Audible. To avoid any vocal strain, we decided to split the 12 chapters we planned to record over three sessions, meaning that I would have to be very precise with my microphone placement, preamp settings, and compression settings across all three sessions. Everything would have to sound nearly the same in the end. If you listen closely, chapter by chapter, there are still some slight discrepancies in the tone of the narration. This could be due to any number of factors, such as how far away Liam sat from the microphone or if he sounded nasally that day. There is a significant difference between the chapters recorded during session 1 (chapters 1 - 8) and session 2 (9 - 11), mostly caused by confidence level and Liam’s position in the vocal booth.

 

I think Liam would agree that throughout each chapter he became less of a narrator and more of a storyteller. Which, any professional narrator would tell you is the key to a good audiobook. His mistakes early on centered more on pronunciation and flow, simply because he was unfamiliar with the text. As he became more familiar, his biggest issues were discrepancies in sentence structure caused by his immersion and excitement for the story.

During our last session, I wanted to focus on going back and fixing any errors and discrepancies. However, I soon realized that this was a fruitless endeavor. As I mentioned previously, there are slight differences in the recordings from different sessions which became very obvious when I was trying to mix in revised lines with the original recordings. I decided to mask these mistakes as best as I could rather than replace them. Simply because the difference was so stark that it was pulling the listener (aka me) out of the story. I would rather have a few mistakes in this situation because the focus of the project isn’t the narration, it is what spatial audio can bring to the world of audiobooks. 

The biggest thing I would recommend to anyone trying to record this type of dialog is to catch mistakes as they happen. Pay very close attention to the script and follow along with the narrator. Liam was actually very grateful when I spoke up and told him to redo the lines. Having the confidence to speak up in the studio is key to being a great engineer. This is something I am still working on, but I know during our last two sessions I got better and caught simple mistakes as they happened. Which saved me time later on while editing. 

Pt. 2 - Pre-Production

Yes I know, this should be part 1. However, I wanted to showcase Liam first. Without him, this project would have never happened, and I can confidently say it would not have turned out as well as it did without his voice. Thank you Liam!

Anyways, let us discuss the exciting world of pre-production!  

Special Shoutout to Nat Hickman who did the cover art for me - @natnarratives

I was responsible for planning, organizing, recording, mixing, and delivering every aspect of this project. The only outside source I needed was Liam.  For the sake of time, I arranged a shorter version of Book I, cutting out anything I deemed unnecessarily boring, including all of chapter 7. Book I also contains more action compared to Book II and thus has more opportunities for creative sound design. 

The biggest challenge for this project was time. Even if the final product turned out poor (which it did not), I wanted to at least say I was able to finish this experiment and answer the question; Can audiobooks creatively be transformed into a spatial audio format? In order to finish this project as quickly and smoothly as possible, I spent a decent amount of time on pre-production. I scheduled every major aspect of this project, and even planned for delays. I cannot express how much this helped me keep this project on track, let alone keep me sane. 

Major milestones included finishing pre-production, finding a narrator, recording dialog, and lastly mixing in a format I was unfamiliar with. Along with all the fun editing and organization that comes in between. 

Speaking of organization, I'd recommend creating a two-column script for anything like this. Similar to creating a ghost clip or cue session for film or tv, a two-column script organizes sounds to specific actions. Except in this case the sounds need to "synchronize" to specific lines in the book, rather than moments in a video. One side of the script has excerpts from the book, specifically lines or passages that I thought could be creatively turned into sound. The other side then has notes or ideas of how those sounds should be created. 

The two-column script also assisted me in planning out what sounds I needed to record / sample. Being able to re-read lines quickly without having to scan the book made searching for samples quite easy. Plus it gave me a list to reference while out recording background ambiance, foley, and my own sound design. 

The image to the right is the final two-column script I ended up referencing constantly. When creating Logic sessions for the full chapter mixes, I made markers where each of these lines occurred in the narration.

Here are some images of my notebook and copy of War of the Worlds. The scratchings in the notebook are a form of a two-column script I made while re-reading the story. 

During this time I also researched publishing, and how I would go about getting something like this on Audible. For starters in order to avoid audio licensing issues, I chose War of the Worlds because it is in the public domain. This means that I would not need to seek permission from the author, publisher, or estate to do an audio adaptation of the work. However, this became a small issue for publishing. 

Audible has some pretty odd rules when it comes to publishing stories that are in the public domain. In order to avoid everyone from cashing in on stories that are not theirs, anything that is in the public domain has to have some sort of addition. For example the original text but with some biography about the author attached to it. Also, in order for a book to be on audible, it first needs to be available as a text/kindle version on Amazon with these same additions. Long story short, I could make the argument that my mix and arrangement of the story are unique. However, Amazon does not have anything in place for audio augmentations and would deny me on the spot. I'm sure I could climb the corporate ladder and fix this, but I think it would be easier to just upload 5.1 mixes on YouTube for everyone to enjoy. 

Pt. 3 - Sourcing and Recording Sounds for Sound Design

I based a lot of my sound design on Jeff Wayne's Musical Version of The War of the Worlds. To be honest, his album is pretty much the idea behind this whole project. I love the parts in the album where he took passages from the original text and in a way made them lyrics. 

Major influences can be heard in my interpretation of the cylinder opening, as well as the infamous ULLA and ALOO. I love the way Jeff Wayne incorporates these sounds as motifs in the album. There was even a time I thought about manipulating the opening bassline from The Artilleryman and the Fighting Machine into one of my chapters, but figured that would be too much. 

Influences aside, I enjoyed creating my own fighting machines from the sounds that I had. 

In regards to sound design, there wasn't a whole lot that I had to create from scratch. Most of the sounds heard throughout the project are ambiances or varying emotions of background walla. Sounds like rifle fire, artillery, and some of the explosions I sourced from sample libraries (or just had them in my personal collection already). The biggest things I had to actually design were the tripods and heat-ray. 

The heat-ray is made out of various synth patches, some of which I created myself on a Korg Monologue. Mixed in are some sine wave pulses for the shorter "gun-like" blast. For the actual "fire beam" I looped some of the more distorted patches to make the ray seem like an actual sword of fire. I also mixed in some flamethrower samples to add a flowing wind element. 

The tripod is made out of various engine, hydraulic, and piston sounds. All combined rhythmically to create a sense of legs stomping through a field. In formats with speakers above the listener, like 7.1.4, most of the engine noises come from above, while the ground debris and stomping are at ear level. The Aloo and Ullas also come from above, where the cockpit of the tripod would be located. Those were made using a talkbox and the monologue. 

This is a video of me experiement with a metal pot. This would become the sound of the cylinder opening. 

Here is a video of me trying to record muddy footsteps. 

Mt. Gilead State Park is where I did all of my field recordings. On this day, It wasn't too windy and the water below couldn't be heard. In the project, you will hear this spot multiple times, especially when the narrator is in or near a forest. Off and to the left of this picture is a flat field. I wanted to record this spot as well, but as I was climbing down the hill a firetruck pulled up and emptied its water tank into the stream. Turns out this is very loud. Luckily my good friend Sam Wallace shared some sounds and ambiances with me from his new sample library. A good amount of factory noises he recorded can be heard in chapter 5 - The Heat Ray, specifically when the metal rod is raised and lowered in the pit. 

Two sample libraries I used quite often were the Soundopolis Halloween 101 pack, and the Free Field Recordings from Luftrum Pack. Both of these I have had for a long time, and never thought they would come in handy for something like this. A lot of the walla came from the Halloween pack, while more of the chaotic weather came from the Luftrum Pack. As well there was some spooky halloween music that I added during the reveal of the martians. All of the other music heard (chapters 1, 9, and 11) I quickly recorded myself using a Yamaha DX7 MkII. I found that the glassyness of the FM synthesis provided the perfect sensation of curiosity and suspense. 

Pt. 4 - Mixing for Spatial Audio

Before this project, I was relatively unfamiliar with spatial audio. Right now, it seems to be a format only a few can afford to enjoy. However, Logic Pro, makes it very easy to mix and listen to this format, even without an optimal setup. The Dolby Atmos Renderer in Logic, allows you to mix in any surround sound format, including binaurally. Binaural audio is still composed of two channels (left and right) however the Dolby renderer manipulates the two channels to create a sense of depth. Allowing anyone with compatible headphones to enjoy spatial audio mixes. 

If I had more time, I would have focused on creating a specific mix for each format. Although, due to the time constraints of my class, I focused on creating an awesome 7.1.4 mix for each chapter. For the other two formats, I let the Dolby renderer work its magic and used it to bounce each chapter down into 5.1 and binaural formats. However, that does not mean these two formats sound bad compared to their 7.1.4 siblings. I would argue that the 5.1 format sounds almost identical to the 7.1.4 format, with the key difference being that the 5.1 mixes sound slightly less open. 

The binaural mixes also sound great, the only issue is that they can get quite muffled at times. This is because I made it so that the only sound coming out of the center speaker is the narration. This keeps the narration audible at all times and lets the listener easily focus on it when they want to. Although this idea of a center speaker does not translate to binaural audio. I noticed that when I soloed the narration while listening binaurally, the renderer added some sort of reverb to make it sound in front of me, rather than just in the center. If I were to redo this project, this is definitely something I would have done more research on.

I don't have a 7.1.4 set up in my personal home studio, so while mixing and designing the chapters I listened binaurally. Later, I would go into Studio C at Capital Univeristy to listen in 7.1.4. For the most part, mixing binaurally translated decently into 7.1.4. 

In Logic, you can either have the panners set to stereo, surround, or 3D objects. The surround panners work how you would expect them to. They allow you to pivot mono or stereo tracks 360 degrees with parameters including elevation, spread, and diversity, as well as LFE level fader. You can also remove the signal from being sent to any of the speakers by clicking on them in the dialog box. The 3D panner creates an object in the Dolby render (to which you can have a total of 118) and choose a left/right position, back/front position, and elevation. In the video above you can see multiple objects being automated in the renderer to create the heat ray. 

3D objects and their movements can be heard clearly when automated. Even with just a normal set of headphones, objects can be heard all around you. If you haven't had the chance to experience the wonder of Dolby Atmos. I'd highly recommend doing it whenever the opportunity arises.  Although that is to say I did not have some difficulties with it. 

I am a huge fan of routing folders in any DAW, but, 3D object tracks do not let you route them to anything besides the master. In hindsight, this makes sense because the object needs to be rendered first before you can hear them, but this adjustment in workflow took some time to get used to. It did not help that Logic actually lets you make track stacks (their version of folders) with 3D objects, but will sneakily change the panner back to stereo or surround without telling you. There were also times, specifically when I was in studio C, when surround tracks would become a different type of surround track. The panner would look the same, but the options to choose which speakers the sound came out of, as well as the LFE fader was gone. I'm not sure if this was due to different versions of logic, or something else, but nevertheless became very annoying when I was trying to send a signal to the sub and the option to do so was removed. 

If you're reading this, and have an interest in working with Dolby Atmos, I would say do it and get creative as possible. But first, do some research about your DAW and how the Dolby renderer works with it. This tech is still very new, and it seems to work differently in just about every DAW. Do your research and save yourself from inevitable frustration.

Pt. 5 - Do Audiobooks Benefit From a Surround Sound Environment? 

I enjoy classic science fiction novels. The worlds that authors were able to create, and sometimes even predict, are full of wonder and mystery. It is hard for me to think of War of the Worlds without picturing the three-legged machines that marched over England, shooting out their heat rays and destroying any living creature they came across. The point of this project was to bring that world into the audio realm. Making the story even more immersive to the listener.

           

In my opinion, spatial audiobooks will become very popular as spatial audio in general becomes more mainstream. While there is certainly a limit to how much sound design you can add over narration, I never found myself becoming distracted from the story. If anything, the sound design began to tell the story without the need for the narrator. I predict, much like how musicians are starting to focus on spatial audio, Authors will do the same with their stories. Leaving gaps between passages or creating new renditions entirely for immersive spatial audiobooks. To my surprise, one of my best friends also created a spatial audiobook for their senior project using the story Where The Wild Things Are by Maurice Sendak. Their project can be listened to here and further adds to my idea that spatial audiobooks should be taken more seriously.

If I were to do this project all over, there is not much I would change. If anything I would have extended this project to the full scope of Book I, though I would definitely have needed more time. The biggest challenge I faced was time. I knew going into this it was going to take a lot of effort and planning to make something I was proud of. In the end, I did just that, I scheduled everything out and for the most part everything stayed on schedule and I created something unique and awesome. In spite of that, I did not account for how much time the rest of my life would take up. My classmates would agree that this semester, which was only eight weeks long, has by far been the most stressful and depressing part of our college lives. Despite that, we all made it. Whether it was our mutual frustration or stubbornness, we all did something to be proud, while at the same time navigating our personal lives. 

Would I do something like this again? Yes, but on my own scale and timeline. I do after all have a whole shelf of potential spatial audiobooks that I believe anyone should be able to enjoy.