26 December 2009

Auralization: Fundamentals of Acoustics, Modelling, Simulation, Algorithms and Acoustic VR

I received Michael Vorlander's book for X-mas. This Springer publication is a great college-level introduction to the science and technology of 3D sound rendering. Reading chapter 1 took me back to my classical dynamics class at the USAFA! Talyor series expansion (ignoring the higher order terms), wave equations (PDEs) and matrix multiplication await the reader. I recently finished David Gerlenter's 1991 book (232 pp), Mirror Worlds. In it he talks about LifeStreams that feed Mirror Worlds to be built in the not so distance future. Anyway, I am investigating the relative maturity of 3D acoustics, graphics and haptics from a computer hardware (i.e., sound cards, graphics cards, etc.) and software engines (i.e., OpenAL-an API, H3D-an API, and X3D). Vorlander writes realtime binaural rendering is about 10 years away. He quantifies the performance in milliseconds and scene density as the driving factors for plausible perception of binaural sounds synthesized from virtual reality interaction of objects driven by the environment or avatars controlled by humans. There are several ISO standards referenced for coefficients of materials to use in sound synthesis algorithms. The annex has several tables of these materials' coefficients by center frequency.

I just want to know how to approach the requirements for virtual objects to have graphics, acoustics and haptics encodings in a single data structure, namely a scene graph per traded item. Someone's LifeGraph software should properly integrated that data into its NXD and render sound, light and force feedback when touched buy owner wearing haptic gear. If the computing machinery needs to have chipsets for realtime photorealism, binaural sound, and force feedback because a Quad-core processor will not do then how do you figure that out analytically? I haven't read any analysis of rendering algorithms in the context of instruction set architecture and I/O devices that explains this yet.

For example, if my mirror world has a Dell keyboard in it and I want to interact with it by typing on it via my true to life avatar using haptic gloves and wearing binaural headphones, will I hear the click & clack of typing because Dell properly encoded the material coefficients for acoustics and the keys' spring constants for haptics to synthesize the sound of typing as a function of my typing in a room on a desk or my lap with my stereo playing electrica music in the background? LifeGraph software just has to sense the user has the gloves, and binaural headphones attached to the computing machinery executing the simulation of the virtual reality/world immersion and interaction. Should I be able to feel my way around any of my rooms of stuff when I don't have a visual display connected? I mean, I could be wearing a microphone and enter my mirror world at the foyer of my summer home, speak and expect the LifeGraph sound chipset to analyze the sound radiation and reproduce the reverbs and direct reflections using the traded items and room's construction materials in place. I would interpret the sound and knowing my home, figure out where I am as I control my avatar through the house using echos. That's a bit far fetched but it's plausible with the property encodings of the house and everything in it including my avatar.