From Andrew Horton Sent Fri, Jun 22nd 2018, 16:58
Maybe that was it On Fri, Jun 22, 2018 at 11:54 AM Mike Perkowitz <xxxx@xxxxxxxxx.xxx> wrote: > > > what, here? I don't remember the last time "the mods" quashed any technic= al discussion at all. sometime in the 00s maybe? particularly since we deci= ded, what the heck, digital synthesis is ok too, it's pretty wide open. or = do you consider me telling someone not to be an asshole "quashing technical= stuff"? :) > > > > On Fri, Jun 22, 2018 at 6:48 AM, A. Horton <xxxxxx.xxxxxx@xxxxx.xxx> wrot= e: >> >> I love that if you get slightly OT talking about relevant technical >> stuff, the mods will come in and panic-squash the conversation. But >> this inane bullshit can go on for weeks, apparently. >> On Thu, Jun 21, 2018 at 9:54 PM <xxxxxxx@xxxxxxx.xxx> wrote: >> > >> > Some may remember the voices of Jane Barbe and Pat Fleet... the old fa= shioned analog way ;): >> > >> > https://www.youtube.com/watch?v=3D0IHzWWMzqmI >> > >> > On June 21, 2018 at 10:43 AM Royce Lee <xxxxxxxxxx@xxxxx.xxx> wrote: >> > >> > The voice was nice, and much of that seems directly relevant to the ki= nd of synthesis and sound quality that we like. >> > Perhaps even more astounding were the snippets of concert piano music.= I couldn't tell from the paper and website if the sounds were synthesized = or merely the performance...but I believe the sounds were re-synthesized. >> > I also thought that the fact that neural networks operated at the samp= le level was of interest to us...given our, or my, general feeling that mos= t digital synthesis has a samey, FM feel to it. Don't get me wrong, I love = FM, but I love FM for its limitations mostly. Perhaps this approach would = finally allow digital synthesis to break out of being a poor stepchild to a= nalogue. >> > >> > On Thu, Jun 21, 2018 at 8:43 AM, John Emond <xxx.xxx@xxxxxx.xxx> wrote= : >> >> >> >> At Bell Northern Research (BNR) we had as many people as possible rec= ite a script. This included the tri-corporate: BNR, Northern Telecom, and B= ell Canada. There was a phone number (of course) to call and recite. The re= sult was voice dialing and voice menu navigation. As might be expected, rec= ognition of the numeral 4 from Chinese people was problematic. >> >> >> >> Cheers, >> >> >> >> John >> >> >> >> Monde Synthesizer gives you More >> >> www.mondesynthesizer.com >> >> >> >> On Jun 21, 2018, at 2:39 AM, annika morgan < xxxxxx.x.xxxxxx@xxxxx.xx= m> wrote: >> >> >> >> I=E2=80=99m more familiar with machine learning on data patterns in a= security engineering context where we train on known good and known bad da= ta sets over various time intervals in order for our systems to increase th= e probability of detecting and correlating anomalous outliers based on beha= vioral analytics. >> >> >> >> In the case of =E2=80=9Ctraining time=E2=80=9D based on voice it=E2= =80=99s that I=E2=80=99m curious how long it takes and how many voice sampl= es it takes before they are able to create a representative voice model. >> >> >> >> In the case of some security tools that are on the market today that = use data science to detect security anomalies they require many millions of= known good and known bad sample files to be input into a machine learning = process in order to build a reliable baseline. In this case I=E2=80=99m cur= ious how many voice samples are required. Like, for instance could I input = 1 million hours of already captioned YouTube videos into this thing and tra= in on voice + text to get a reliable enough sample set to reproduce voice, = or do I have to pay 10000 people to come in and read 100 pre-prepared scrip= ts? >> >> >> >> I=E2=80=99m mostly curious what the level of effort their training ex= ercise requires. I should have been more specific, apologies. >> >> >> >> On Wed, Jun 20, 2018 at 10:40 PM Mike Perkowitz < xxxx@xxxxxxxxx.xxx>= wrote: >> >>> >> >>> >> >>> it's a machine learning algorithm, so "training" is when the algorit= hm examines examples of the thing it's going to model. so "at training time= ..." means that the algorithm is given recordings of human speakers, which = it analyzes to produce a model that can spit out the kinds of sounds they d= emonstrate. >> >>> >> >>> I think "training" in the context of a speech recognition tool like = Dragon Naturally refers to the time the user has to spend teaching the tool= to recognize their voice. totally different :) >> >>> >> >>> >> >>> On Wed, Jun 20, 2018 at 9:19 PM, annika morgan <xxxxxx.x.xxxxxx@xxxx= l.com> wrote: >> >>>> >> >>>> =E2=80=9C At training time, the input sequences are real waveforms = recorded from human speakers. After training, we can sample the network to = generate synthetic utterances.=E2=80=9D >> >>>> >> >>>> I=E2=80=99m curious what training time means exactly. >> >>>> >> >>>> On Wed, Jun 20, 2018 at 7:44 PM Royce Lee < xxxxxxxxxx@xxxxx.xxx> w= rote: >> >>>>> >> >>>>> >> >>>>> This is probably old news to some of you, but it seems like this k= ind of software would make for good music synthesis. >> >>>>> >> >>>>> https://deepmind.com/blog/wavenet-generative-model-raw-audio/ >> >>> >> >>> >> > >> > >> > > >