Re: [AH] Synthesis futures

From Andrew Horton
Sent Fri, Jun 22nd 2018, 16:58

Maybe that was it
On Fri, Jun 22, 2018 at 11:54 AM Mike Perkowitz <xxxx@xxxxxxxxx.xxx> wrote:
>
>
> what, here? I don't remember the last time "the mods" quashed any technic=
al discussion at all. sometime in the 00s maybe? particularly since we deci=
ded, what the heck, digital synthesis is ok too, it's pretty wide open. or =
do you consider me telling someone not to be an asshole "quashing technical=
 stuff"? :)
>
>
>
> On Fri, Jun 22, 2018 at 6:48 AM, A. Horton <xxxxxx.xxxxxx@xxxxx.xxx> wrot=
e:
>>
>> I love that if you get slightly OT talking about relevant technical
>> stuff, the mods will come in and panic-squash the conversation. But
>> this inane bullshit can go on for weeks, apparently.
>> On Thu, Jun 21, 2018 at 9:54 PM <xxxxxxx@xxxxxxx.xxx> wrote:
>> >
>> > Some may remember the voices of Jane Barbe and Pat Fleet... the old fa=
shioned analog way ;):
>> >
>> > https://www.youtube.com/watch?v=3D0IHzWWMzqmI
>> >
>> > On June 21, 2018 at 10:43 AM Royce Lee <xxxxxxxxxx@xxxxx.xxx> wrote:
>> >
>> > The voice was nice, and much of that seems directly relevant to the ki=
nd of synthesis and sound quality that we like.
>> > Perhaps even more astounding were the snippets of concert piano music.=
 I couldn't tell from the paper and website if the sounds were synthesized =
or merely the performance...but I believe the sounds were re-synthesized.
>> > I also thought that the fact that neural networks operated at the samp=
le level was of interest to us...given our, or my, general feeling that mos=
t digital synthesis has a samey, FM feel to it. Don't get me wrong, I love =
FM, but I love FM for its limitations mostly.  Perhaps this approach would =
finally allow digital synthesis to break out of being a poor stepchild to a=
nalogue.
>> >
>> > On Thu, Jun 21, 2018 at 8:43 AM, John Emond <xxx.xxx@xxxxxx.xxx> wrote=
:
>> >>
>> >> At Bell Northern Research (BNR) we had as many people as possible rec=
ite a script. This included the tri-corporate: BNR, Northern Telecom, and B=
ell Canada. There was a phone number (of course) to call and recite. The re=
sult was voice dialing and voice menu navigation. As might be expected, rec=
ognition of the numeral 4 from Chinese people was problematic.
>> >>
>> >> Cheers,
>> >>
>> >> John
>> >>
>> >> Monde Synthesizer gives you More
>> >> www.mondesynthesizer.com
>> >>
>> >> On Jun 21, 2018, at 2:39 AM, annika morgan < xxxxxx.x.xxxxxx@xxxxx.xx=
m> wrote:
>> >>
>> >> I=E2=80=99m more familiar with machine learning on data patterns in a=
 security engineering context where we train on known good and known bad da=
ta sets over various time intervals in order for our systems to increase th=
e probability of detecting and correlating anomalous outliers based on beha=
vioral analytics.
>> >>
>> >> In the case of =E2=80=9Ctraining time=E2=80=9D based on voice it=E2=
=80=99s that I=E2=80=99m curious how long it takes and how many voice sampl=
es it takes before they are able to create a representative voice model.
>> >>
>> >> In the case of some security tools that are on the market today that =
use data science to detect security anomalies they require many millions of=
 known good and known bad sample files to be input into a machine learning =
process in order to build a reliable baseline. In this case I=E2=80=99m cur=
ious how many voice samples are required. Like, for instance could I input =
1 million hours of already captioned YouTube videos into this thing and tra=
in on voice + text to get a reliable enough sample set to reproduce voice, =
or do I have to pay 10000 people to come in and read 100 pre-prepared scrip=
ts?
>> >>
>> >> I=E2=80=99m mostly curious what the level of effort their training ex=
ercise requires. I should have been more specific, apologies.
>> >>
>> >> On Wed, Jun 20, 2018 at 10:40 PM Mike Perkowitz < xxxx@xxxxxxxxx.xxx>=
 wrote:
>> >>>
>> >>>
>> >>> it's a machine learning algorithm, so "training" is when the algorit=
hm examines examples of the thing it's going to model. so "at training time=
..." means that the algorithm is given recordings of human speakers, which =
it analyzes to produce a model that can spit out the kinds of sounds they d=
emonstrate.
>> >>>
>> >>> I think "training" in the context of a speech recognition tool like =
Dragon Naturally refers to the time the user has to spend teaching the tool=
 to recognize their voice. totally different :)
>> >>>
>> >>>
>> >>> On Wed, Jun 20, 2018 at 9:19 PM, annika morgan <xxxxxx.x.xxxxxx@xxxx=
l.com> wrote:
>> >>>>
>> >>>> =E2=80=9C At training time, the input sequences are real waveforms =
recorded from human speakers. After training, we can sample the network to =
generate synthetic utterances.=E2=80=9D
>> >>>>
>> >>>> I=E2=80=99m curious what training time means exactly.
>> >>>>
>> >>>> On Wed, Jun 20, 2018 at 7:44 PM Royce Lee < xxxxxxxxxx@xxxxx.xxx> w=
rote:
>> >>>>>
>> >>>>>
>> >>>>> This is probably old news to some of you, but it seems like this k=
ind of software would make for good music synthesis.
>> >>>>>
>> >>>>> https://deepmind.com/blog/wavenet-generative-model-raw-audio/
>> >>>
>> >>>
>> >
>> >
>> >
>
>