From Andrew Horton Sent Fri, Jun 22nd 2018, 13:46
I love that if you get slightly OT talking about relevant technical stuff, the mods will come in and panic-squash the conversation. But this inane bullshit can go on for weeks, apparently. On Thu, Jun 21, 2018 at 9:54 PM <xxxxxxx@xxxxxxx.xxx> wrote: > > Some may remember the voices of Jane Barbe and Pat Fleet... the old fashi= oned analog way ;): > > https://www.youtube.com/watch?v=3D0IHzWWMzqmI > > On June 21, 2018 at 10:43 AM Royce Lee <xxxxxxxxxx@xxxxx.xxx> wrote: > > The voice was nice, and much of that seems directly relevant to the kind = of synthesis and sound quality that we like. > Perhaps even more astounding were the snippets of concert piano music. I = couldn't tell from the paper and website if the sounds were synthesized or = merely the performance...but I believe the sounds were re-synthesized. > I also thought that the fact that neural networks operated at the sample = level was of interest to us...given our, or my, general feeling that most d= igital synthesis has a samey, FM feel to it. Don't get me wrong, I love FM,= but I love FM for its limitations mostly. Perhaps this approach would fin= ally allow digital synthesis to break out of being a poor stepchild to anal= ogue. > > On Thu, Jun 21, 2018 at 8:43 AM, John Emond <xxx.xxx@xxxxxx.xxx> wrote: >> >> At Bell Northern Research (BNR) we had as many people as possible recite= a script. This included the tri-corporate: BNR, Northern Telecom, and Bell= Canada. There was a phone number (of course) to call and recite. The resul= t was voice dialing and voice menu navigation. As might be expected, recogn= ition of the numeral 4 from Chinese people was problematic. >> >> Cheers, >> >> John >> >> Monde Synthesizer gives you More >> www.mondesynthesizer.com >> >> On Jun 21, 2018, at 2:39 AM, annika morgan < xxxxxx.x.xxxxxx@xxxxx.xxx> = wrote: >> >> I=E2=80=99m more familiar with machine learning on data patterns in a se= curity engineering context where we train on known good and known bad data = sets over various time intervals in order for our systems to increase the p= robability of detecting and correlating anomalous outliers based on behavio= ral analytics. >> >> In the case of =E2=80=9Ctraining time=E2=80=9D based on voice it=E2=80= =99s that I=E2=80=99m curious how long it takes and how many voice samples = it takes before they are able to create a representative voice model. >> >> In the case of some security tools that are on the market today that use= data science to detect security anomalies they require many millions of kn= own good and known bad sample files to be input into a machine learning pro= cess in order to build a reliable baseline. In this case I=E2=80=99m curiou= s how many voice samples are required. Like, for instance could I input 1 m= illion hours of already captioned YouTube videos into this thing and train = on voice + text to get a reliable enough sample set to reproduce voice, or = do I have to pay 10000 people to come in and read 100 pre-prepared scripts? >> >> I=E2=80=99m mostly curious what the level of effort their training exerc= ise requires. I should have been more specific, apologies. >> >> On Wed, Jun 20, 2018 at 10:40 PM Mike Perkowitz < xxxx@xxxxxxxxx.xxx> wr= ote: >>> >>> >>> it's a machine learning algorithm, so "training" is when the algorithm = examines examples of the thing it's going to model. so "at training time...= " means that the algorithm is given recordings of human speakers, which it = analyzes to produce a model that can spit out the kinds of sounds they demo= nstrate. >>> >>> I think "training" in the context of a speech recognition tool like Dra= gon Naturally refers to the time the user has to spend teaching the tool to= recognize their voice. totally different :) >>> >>> >>> On Wed, Jun 20, 2018 at 9:19 PM, annika morgan <xxxxxx.x.xxxxxx@xxxxx.x= om> wrote: >>>> >>>> =E2=80=9C At training time, the input sequences are real waveforms rec= orded from human speakers. After training, we can sample the network to gen= erate synthetic utterances.=E2=80=9D >>>> >>>> I=E2=80=99m curious what training time means exactly. >>>> >>>> On Wed, Jun 20, 2018 at 7:44 PM Royce Lee < xxxxxxxxxx@xxxxx.xxx> wrot= e: >>>>> >>>>> >>>>> This is probably old news to some of you, but it seems like this kind= of software would make for good music synthesis. >>>>> >>>>> https://deepmind.com/blog/wavenet-generative-model-raw-audio/ >>> >>> > > >