Kudos
Kudos for using a good picture of Miku~~~~
Japanese boffins say they have developed a process whereby computers can be taught to sing so well that they are indistinguishable from talented human performers. It was already possible to produce artificial singing, by inputting lyrics and musical scores into software packages such as Vocaloid. The results are easily …
"This was a triumph
I'm making a note here: huge success
It's hard to overstate my satisfaction
Aperture Science
We do what we must, because we can
For the good of all of us
Except the ones who are dead
But there's no point crying over every mistake
You just keep on trying til you run out of cake
And the science gets done
And you make a neat gun
For the people who are still alive....."
GLaDOS is a fantastic singer.
"The curves which produced the best results are then used as "parents" to create a new generation and the process is repeated as required in a survival-of-the-fittest process"
That description sounds like a genetic algorythm. Not knocking the idea - just curious if they even realize the approach they are taking has been in use in other industries for a long time. Might even be that that prior experience could optimize the process. nyeh who cares.
Their use of "evolutionary" suggests to me they have tried to explain genetic algorithms to someone they think incapable of understanding the idea.
The requirement for a sound engineer to judge each of the 8 recordings in each round of refinement still makes this process heavily human-dependant (and presumably costs a lot of the engineers time). This doesn't sound like a great breakthrough in artificial artifice.
You don't think a "recording artist" just turns up at a studio, sings a song and then someone shouts "That's a wrap! Cut it!" do you? Nope. Multiple "takes" are recorded then the engineer can spend days if not weeks tweaking, cut'n'pasting between "takes" etc to get that perfect recording.
This system not only does away with the cost of the "artist" but could even significantly reduce the engineer time by automating much of the process. Then there's the studio and musicians you no longer need to pay for.
This should be ideal for the iPod generation. Constant new pap produced, faster than ever. Likewise for the Big Labels. They can carry on charging 79p per track while massively cutting out production costs and all but eliminating royalties.
Excuse me while I go outside and shoot myself :-(
"The requirement for a sound engineer to judge each of the 8 recordings in each round of refinement still makes this process heavily human-dependant (and presumably costs a lot of the engineers time)."
Sure, for the moment - though as other posters point out, they're still reducing costs. But accumulate enough data from those trials and you can use supervised-learning techniques to train a hidden-variable model to duplicate the results. You'd have to do some analysis to select feature sets, but I'd think a maximum-entropy Markov Model would do the trick. This kind of stuff is done all the time in Natural Language Processing - the algorithms are well-understood.
It wouldn't always match what a human judge (ie, the sound engineer) would choose, but then human judges aren't consistent either. As long as the probability is high enough, you could automate the whole thing with reasonable expectation of getting a product that most listeners would believe was produced by a human singer (or couldn't reliably distinguish from such).
Only the Japanese could use a kawaii hooker as their promo image. Anyone else slightly worried about a country which says "We need cartoons of women! And they've got to have legs up to here, and short skirts, and bondage boots! And they've got to look like they're about 8 years old!"...? And this is just regular ads, never mind what they think is *really* strange.
Forget The Turing Test, we now present the musical equivalent - The Cowell Test.
Stage 1: Can a famous music producer tell the difference between a genuine human rendition of a song, and a computer-generated rendition?
Stage 2: Can a trained (pop) vocal coach do likewise?
Stage 3: Can a trained (opera?) singer do likewise?
Stage 4: Can a trained (opera?) vocal coach do likewise?
Next step: since there's already a computer program that can analyse music scores from different composers and compose an original piece in the style of a certain composer, develop software that analyses pop lyrics and writes pop songs.
Then: develop holographic software to the extent of being able to project an 3D CGI character onto stage.
Eventually: why bother with human performers when you can have computers write and perform pop songs from scratch? Who knows - they might be able to produce songs of higher calibre than the likes of Ark Music Factory, Jedward or (even worse) Jemini. Who knows - they might be able to produce virtual Eurovision candidates that perform better than our real-life entrants...
Why should the AI settle for controlling a vast starship capable of obliterating small moons...
...when it could control a starship the size of a moon capable of obliterating small planets? :)
No doubt while humming all the orchestral parts to a certain John Williams march in perfect pitch...
You'd also need something that generated tales of woe and triumph for these soft stars to get the grannies emotionally involved enough come present buying time that they could remember what they'd been told to buy.
Some kind of virtual rehab for them to head to when they can't generate publicity any other way would probably also help.
Since they have been putting images of dead actors in commercials for some time I submit is has ALWAYS been the goal to have a complete construct.
Now that they can get a more human voice it is simply a matter of time before Sharon Apple is reality.
As has been stated before an artificial construct will be no more or less "real" than most entertainment creations of the last decade at least.
Just my opinion. I could be wrong.
Why hire real actors (particularly on radio dramas) when you can get computers to provide the characters' voices for a fraction of the cost? :)
Perhaps in about three decades, technology will have improved to the extent whereby off-the-shelf or freeware text-to-speech engines sound human.... even when dealing with tricky words such as Llanfairpwllgwyngyllgogerychwryndrobwlllantysiliogogogoch... without the need to provide them in IPA form :)