Re: [AH] Synthesis futures

From Jim Michmerhuizen
Sent Thu, Jun 21st 2018, 16:44

On 06/21/2018 10:43 AM, Royce Lee wrote:
> The voice was nice, and much of that seems directly relevant to the 
> kind of synthesis and sound quality that we like.
> Perhaps even more astounding were the snippets of concert piano music. 
> I couldn't tell from the paper and website if the sounds were 
> synthesized or merely the performance...but I believe the sounds were 
> re-synthesized.
> I also thought that the fact that neural networks operated at the 
> sample level was of interest to us...given our, or my, general feeling 
> that most digital synthesis has a samey, FM feel to it. Don't get me 
> wrong, I love FM, but I love FM for its limitations mostly.  Perhaps 
> this approach would finally allow digital synthesis to break out of 
> being a poor stepchild to analogue.
>
As I read it, everything in the piano snips was synthetic -- both the 
sounds and the notes.

*This is huge.*  From the beginning of audio synthesis (in the 1860 with 
Thaddeus Cahill's "Telharmonium") until now, nothing that has been done 
in audio synthesis has integrated the two domains of auditory 
experience: the realm of < 20Hz and the realm of > 20Hz.

Performance and composition, for us humans, are in the realm of < 20Hz.  
Timbre, inflection, pitch are in the realm of > 20Hz.

The methods described in the DeepMind paper, as I read it, do not 
distinguish between these.  That's how, for example, those methods can 
wind up generating gibberish speech.  The music in the piano samples 
was, similarly, gibberish; but it sure sounded nice.

Imagine the same -- or similar -- learning being given the audio of a 
Wagner opera.  Or the complete Beatles albums.

It would generate the music *and its instruments* in one pass.  
Including, for example from Ravel's "Bolero", all of the goofy  
combinations of orchestral instruments, plus whatever new instruments it 
might invent on the spot.

Yikes.

Michmerhuizen