back to article Here's why AI can't make a catchier tune than the worst pop song in the charts right now

Neural networks are neat at spotting and reproducing patterns in images and text – yet they still struggle when spitting out audio. There are numerous examples of artificially intelligent software improvising fairly realistic images of people, buildings, and other objects, from training material. However, when it comes to …

  1. Anonymous Coward
    Anonymous Coward

    It would be interesting to see the audio of a choir's arrangement analysed and split into original parts. All sorts of calculations needed to decide whether a frequency was actually in the source - or was the result of the mixing of two other frequencies.

    Has anyone ever done it?

    1. Christian Berger Silver badge

      Well mixing should not occur

      Unless you have distortion which you usually don't do (intentionally) for choirs. And the unintended distortions of modern music recording equipment are negligible.

      1. Anonymous Coward
        Anonymous Coward

        Re: Well mixing should not occur

        "Well mixing should not occur [...]"

        If there are two pure sound sources in air at two different frequencies - then the "beat" frequencies will also be heard if they are within the audible range. That's the basis of polyphonic singing.

        There is a performance trick where a group of four men sing in a precise harmony - and the audience also hears a female fifth voice.

        1. Christian Berger Silver badge

          Re: Well mixing should not occur

          "If there are two pure sound sources in air at two different frequencies - then the "beat" frequencies will also be heard if they are within the audible range. That's the basis of polyphonic singing."

          Well but that's strictly not "mixing". What you have here is your ear/brain-system interpreting 2 tones at close distances as one tone which is changing in intensity in a certain way. That's simply because that's essentially the same thing.

          The ear/brain-system interprets the signal in certain ways. It's a bit like that triangle pointing downwards in this picture:

          http://www.whatispsychology.biz/kanizsa-triangle-illusion-explanation

          You could argue that it's objectively not there. You could however also argue that it is there, because all 3 corners are plainly visible.

          The same can be said about beat frequencies. Yes, they objectively exist if you look at the envelope of the resulting signal, but no, if you look at the spectrum it does not exist.

          1. Jeffrey Nonken Silver badge

            Re: Well mixing should not occur

            Oh, trust me, it exists. Otherwise heterodyning wouldn't work.

            1. terrythetech

              Re: Well mixing should not occur

              It only works if the mixing is not additive/linear, it is usually achieved by multipying f1xf2 to produce sum and difference frequencies f1+f2 and f1-f2. Additive distortion free mixing produces no new frequencies. Hearing of beats is entirely down to how our hearing works.

          2. Anonymous Coward
            Anonymous Coward

            Re: Well mixing should not occur

            " Yes, they objectively exist if you look at the envelope of the resulting signal, but no, if you look at the spectrum it does not exist."

            A genuine question - what does a spectrum analyser show? Does it not show any energy at the beat frequencies?

          3. The Nazz Silver badge

            Re: Well mixing should not occur

            As an aside re the Kanizsa Triangle :

            FTA "For example, in the Kanizsa Triangle Illusion we readily perceive three black circles and two triangles, even though there are technically no circles or triangles in the image."

            I can see ( no pun intended ) what they're getting at but, if one can easily perceive two triangles then i'm equally certain i can perceive at least six more ( there may be others).

            If it was truly A! and not some Deep Mind "programmed" carry on, just let the thing listen to the Radio for 24 hours and then compose it's own tunes.

            If it really was intelligent it would simply just turn Radio 1 off, perhaps even petition for it's closure.

  2. Christian Berger Silver badge

    I'm actually surprised that it works on raw samples at all

    I mean neural networks are traditionally not known for being good at very graduated outputs. I've not heared the output yet, but if it's anything like music, it would be very surprising.

    1. Christian Berger Silver badge

      Re: I'm actually surprised that it works on raw samples at all

      OK I've listened to one of the examples now. It's certainly impressive for a Neural Network, but far from what you can you can do with simple Markow Chains.

      1. cosine

        Re: I'm actually surprised that it works on raw samples at all

        The second sample is a literal copy of a Mozart sonata (with some distortion).

        Know your classics!

        But this brings a deeper point: how to check that the programme is not simply cutting and pasting frragments. Given that it will have been trained on 1000s of hours of music, the researchers won't know...

        1. mevets

          Re: I'm actually surprised that it works on raw samples at all

          “cutting and pasting” -- that is very much what those recordings sounded like to me; a collection of vaguely similar phrases dovetailed together. If you concentrated this process on a single composer of mono-genre music (eg. Webber, Boston, Monkees, Sex Pistols :) it might come up with something novel. Feeding a travesty generator Liszt to Brahms seems destined to fail...

    2. jmch Silver badge

      Re: I'm actually surprised that it works on raw samples at all

      "I've not heared the output yet, but if it's anything like music, it would be very surprising."

      The samples posted in the article actually do sound like music even though there are some jarring moments. However these are still approx 10-second clips, it might not be so pretty if extended to a few minutes, and even less so over the length of classical music pieces which typically can be anywhere from 20-100 minutes.

      I think that moving from using MIDI encodings to raw WAV files is a step in the right direction but I suspect that something is still missing... it probably requires simulating how our ears encode vibrations into neural patterns (which is, after all, a sort of Analogue-to-digital converter)

      1. Christian Berger Silver badge

        Re: I'm actually surprised that it works on raw samples at all

        I actually think that moving to raw samples is a rather bad thing to do. After all that greatly increases the complexity of the problem by adding lots of irrelevant information.

        I mean no animal on earth hears by samples, they all hear by intensity over frequency and perhaps phase differences between different bandpass filters. Musicians also don't output samples, but manipulations of instruments.

        1. DJSpuddyLizard

          Re: I'm actually surprised that it works on raw samples at all

          I actually think that moving to raw samples is a rather bad thing to do

          Exactly. They're already making it harder. I don't go around humming a pop song because I remember the Ronkhorn synth had extra reverb in it, I remember the song because of the melody.

          Perhaps if they'd started with the glaringly obvious idea of just using more source material they could do better.

          1. Charles 9 Silver badge

            Re: I'm actually surprised that it works on raw samples at all

            I thought they used MIDI because the files could be tagged so the machine could perceive what parts are refrains, stanzas, etc because the tags would tell them.

      2. JDX Gold badge

        Re: I'm actually surprised that it works on raw samples at all

        I was surprised they don't find using actual music score (or similar) works... it's the language of music encoded quite strictly. Training AI on "just the noise" is counterintuitive to me.

        What was the actual output format? Surely the AI didn't hear piano WAV and emit a WAV with individual sounds exactly like piano keys?!

        1. Evil Auditor Silver badge

          Re: I'm actually surprised that it works on raw samples at all

          I agree, also to me it seems counterintuitive using wave forms instead of score, e.g. midi. Especially, as I believe that it would be easier to detect/learn the structures of a piece of music from musical notations rather than from wave forms. (Having played and failed with artificial neural networks doing midi many moons ago, I might be a bit biased though.)

          On second thought, it is maybe not a bad idea. Musical notations are, after all, only a limited language to describe music. They are not music. Going back to the actual music, which is much closer represented by wave forms than notations, might be an interesting approach.

          Haven't listened to the samples yet. I'm curious how they sound, being first samples. And comparing to how far "AI midi" got (or didn't go) in all the years it's been around...

        2. Christian Berger Silver badge

          Re: I'm actually surprised that it works on raw samples at all

          "What was the actual output format? Surely the AI didn't hear piano WAV and emit a WAV with individual sounds exactly like piano keys?!"

          Apparently they did exactly that. This is the second paper I've seen which did that. It's not a smart thing to do, but given enough CPU power you can probably get away with it.

        3. Mark 85 Silver badge

          Re: I'm actually surprised that it works on raw samples at all

          I was surprised they don't find using actual music score (or similar) works... it's the language of music encoded quite strictly. Training AI on "just the noise" is counterintuitive to me.

          Probably because no one thought of that. It should be a easy solution and implantation as opposed to what they're doing now. But... this research and that means grants and degree of difficulty. Someone's just not thinking outside the box.

          1. Swarthy Silver badge
            Paris Hilton

            Re: I'm actually surprised that it works on raw samples at all

            I would be tempted to try some sort of 2D (or 3D for the extra challenge) visual representation of known music (perhaps frequencies defining hue, amplitude defining saturation, and time being converted to physical space) to train one of the visual/image "AI" machines and see if converting their generated output would do better than the audio AI.

    3. Anonymous Coward
      Anonymous Coward

      Re: I'm actually surprised that it works on raw samples at all

      I mean neural networks are traditionally not known for being good at...

      ...ANYTHING.

      FTFY, and you owe me a beer.

  3. Mongrel

    "We expect tunes to sustain a structure over a matter of minutes, whereas computers end up flitting about between styles every few seconds."

    So, that's where Dubstep came from...

    1. Anonymous Coward
      Anonymous Coward

      Shhh, Dubstep didn't exist it was a horrible nightmare about a brick in a washing machine.

  4. frank ly

    Composing

    It seems to be overkill to analyse and generate audio records. What is the problem with analysing and generating sheet music (musical notation)?

    1. Christian Berger Silver badge

      Re: Composing

      We are in a current hype where fast results seem to be more important than trying something productive. Simply using samples requires the least amount of work by the scientists. Digitizing sheet music, or writing a parser for MIDI-files is much harder.

  5. this

    Pastiche

    At best I would call the audio result a pastiche. It strikes me that a lot of AI manifestations are simply that.

  6. JimC

    speaking as an amateur musician

    The article makes it sound as if the main reason Ais are having trouble is because the people designing them don't have a great understanding of music. The text gives me the impression that the AIs are failing to work out basic song structure, which at the chart level is highly formulaic. Much the same can be said for intervals that make up basic melodies.

  7. Little Mouse

    They cracked this in the 70's

    ..with the Harrington 1200.

    You've probably never seen one, they are very expensive.

    1. Christian Berger Silver badge

      Re: They cracked this in the 70's

      I think they are nearly a thousand pounds, aren't they.

      However to be honest the musical numbers composed by it never quite got out of the UK. Even "Little Mouse" is barely known in Germany, for example.

  8. Nosher

    Looking through the wrong end of the telescope?

    Much like a comment above suggests, I don't really get this. Most conventional music has rules - those rules are fairly easy to quantify: start with any note, use that as your base key. Pick major or minor (or melodic, enharmonic, whatever) to give you a scale for that key (the notes to use are rule-based too). Pick 1st, 3rd and 5th of your current scale to give you a simple chord that works with it; add a 7th or occasional minor 3rds if you want to be bluesey, etc. Now and again, change your scale root to the 4th note, then the 5th of your base key - bingo, you've got a 12-bar-style progression - the basis of much rock and/or roll music. Noodle around with the notes in your key and generate some sort of melody. Add in some alternate progressions for variety or as a middle eight (again, there are established patterns for all of this). Does it sound OK? If not, start again and try other combinations. This is pretty much entirely how I learned to improvise Jazz/Blues piano

    1. Anonymous Coward
      Anonymous Coward

      Re: Looking through the wrong end of the telescope?

      "[...] and generate some sort of melody."

      IIRC that is the most difficult bit to create. After that inspiration - the approved rules of composition, the chosen instrument, and fashion tend to determine what you will get as the final piece. A composer's work is often identifiable by their borrowing melodies from their own previous compositions.

      Many composers have borrowed their melody from another composer eg Vaughan Williams from Tallis; Procol Harum from J.S.Bach. Also from old folk tunes eg Dvořák; Mahler; Grainger.

    2. Anonymous Coward
      Anonymous Coward

      Re: Looking through the wrong end of the telescope?

      You are missing the most important part to popular music, other genres are different of course.

      https://www.youtube.com/watch?v=ckMvj1piK58&gl=GB&hl=en-GB

      Apologies in advance unless you are from Bolton then this is your fault.

    3. veti Silver badge

      Re: Looking through the wrong end of the telescope?

      The "Does it sound OK?" judgment is likely quite hard for an AI to answer.

      I suspect that your personal aesthetic judgment plays a larger part in the process than you consciously allow.

  9. Paul

    have a play with Jukedeck which allows you to make longer pieces and choose a style.

    shameless referrer whoring:

    https://www.jukedeck.com/make?invite=ccd596dc5f8a20be9546d12dea7d9716715cbd499227297fb815356c204caa14

    1. Brennan Young

      That's quite a nice front end for the job, but the chord progressions seem a bit bland. I chose "aggressive electro pop" (hoping for a bit of edge) and got a friendly little ditty. I prefer a bit more dissonance.

  10. spold Bronze badge

    I'm sure it will "evolve"

    I see a cover of "Are friends electric?" in the offing.

  11. Flywheel Silver badge
    Pint

    Now That's What I call Music 101

    Well, that's sorted the tunes out for the next album of dross. Bring it on! next stop, Britain's got "Talent"...

  12. Joerg

    All known public AI algorithms are just a trick.. that is why!

    All known public AI algorithms are just a trick.. that is why! All AI algorithms are nothing more than an endless complex mess of if-then-else statements ... just put in a different way to look different and most people think that is something magic and new and so on. It is not. At the core everything is still just a if-then-else thing. You can have as many BT, FSM, A* , UML, Neural Networks and other abstraction graph algorithms things as you want but at the core and deep down at the actual code running on silicon it is nothing more than a huge mess of if-then-else statements.

    Of course anything like this can't really simulate any real lifeform even remotely. Even the most complex stuff using all those algorithms won't look real and alive anyway nor really smart. Any "smart" thing happening is just the result of mathematical algorithms and statistics at work, the whole "AI" thing is just a facade, it really is nothing more then a mechanism used to better refine data results and adjust main math algorithms parameters. And that's it. Nothing magic. Nothing AI. Nothing alive. Period.

    1. Charles 9 Silver badge

      Re: All known public AI algorithms are just a trick.. that is why!

      But then, ask yourself, "In the final analysis, are we any different? Are we better simply in degree rather than also in kind?" For example, can we figure something from whole cloth without even experience to guide us? For example, can someone figure out right away how to throw and catch a ball if they've never seen one before?

      1. Teiwaz Silver badge

        Re: All known public AI algorithms are just a trick.. that is why!

        For example, can someone figure out right away how to throw and catch a ball if they've never seen one before?

        Catch, maybe not. Babies learn to chuck stuff out of the pram pretty quick. Agression seems to be a pre-programmed learn target, it's the cooperative bit that has to be painstakingly learned in humans.

        1. Bruno de Florence

          Re: All known public AI algorithms are just a trick.. that is why!

          Very Freudian :-)

      2. mevets

        Re: All known public AI algorithms are just a trick.. that is why!

        Over the years thinkers have metaphorically and literally linked the human mind to the fantastic creation of the day. Flows, clocks, and steam engines have all served as models of the human mind by renowned thinkers like Hippocrates, Descartes, Freud. These devices were, at the time, the ultimate manifestations of human thought. That we think very fast matrix multipliers are the basis of thought is both a credit to the devices, and a dishonour to millions of years of evolutionary processes.

        If such a metaphor helps to simplify the outline of such a complex field, that is all well and good, but remember how long it took to discredit Freuds theory of repression, which was really just a ‘back port’ of steam engine thinking.....

  13. Snarf Junky

    AudioSkyNet

    You know it'll just end up producing a sound so dreadful that our heads will explode....and then it will obviously take over the world.

    1. Mark 85 Silver badge

      Re: AudioSkyNet

      Basic rule then.. "Look in the door window to the lab. IF staff heads have exploded, do not enter. Ever. Kill power to the room and seal the door." Not to be forgotten: "Be afraid. Be very afraid." or perhaps: "Abandon all hope ye who enter here."

  14. AstroNutter

    I get this completely

    Some of you have posted that you don't get why they switched from Midi to raw audio waves.

    I happen to think that is exactly the right thing to do. Unless you are going to teach the neural network to actually play the real instruments.

    Here's my thinking. There's a number of times when I have seen sheet music for something. If I play that piece of music "exactly" as written it will sound awful. This is effectively what the Midi based AI's are doing, they're following the patterns of things written in sheet music form.

    However, the thing that separates a good performance from a bad performance is the difference between what's written on the sheet of music - which is a guide, and the actual performance.

    A performance can come alive when the player modifies the tempo, changes the pitch, attack or vibrato on a given note. They may also play the a given note slightly sharp or flat, add dynamic effects.

    I mentioned the attack of a note. Henry Purcell's compositions for example require a trumpet to be played full with round notes. The pieces that I'm thinking of are a little pompous, and confident.

    Bhramms Lullaby however needs a soft approach. The individual notes should be rounded, giving a much less aggressive attack. The tempo can be adjusted more and there's also room to apply more dynamic range.

    A midi file simply won't contain all this detail, but by programming an AI with raw audio sounds. There is an opportunity for this parts to be learned.

    1. Christian Berger Silver badge

      Re: I get this completely

      Well yes, but no performer performs a piece by typing in sample values into a text editor and then then converting that to a wav-file.

      Performers don't think in terms of actual samples, they think in tones, and while they may think of them in ways standard notation cannot convey, it's still tones to them, not samples. Just like painters don't think in pixels, but brush strokes and shapes and such.

    2. Nosher

      Re: I get this completely

      "A midi file simply won't contain all this detail, but by programming an AI with raw audio sounds. There is an opportunity for this parts to be learned."

      I agree, but I'm not thinking about MIDI (at least not in the play-by-numbers sense). What I mean is - and I say this as a semi-professional pianist - that it must surely be easier to teach AI to play something like a piano like a musician by reading and understanding actual music than it is to try and filter out the useful bits from the huge pile of data that is the raw audio stream of a performance. Even what seems like the difficult stuff, for instance interpretation, is something that most musicians have to learn: when I was nine, I didn't know any of this and just plonked everything out at the same volume, but I learned over time what sounded better.

  15. Brennan Young

    Not pop

    I can't understand the journalist going on about pop, when the training samples (and results) are obviously based on classical music.

    Frankly, it sounds a lot better than Schoenberg et al., but a long way from Debussy. Not a long way from Conlon Nancarrow's pre-mechanical works. Nice.

    1. Bruno de Florence

      Re: Not pop

      Both Western pop & Western classical music (aka the common practice) use the same structure, which was developed by fiddling about over a period of 400 years. Strictly speaking, there is no difference between a Mozart symphony and a Madonna song. Note the "Western" i added to "music".

      Which Schoenberg do you refer to? Schoenberg of the Gurre Lieder or the atonal Schoenberg? Schonberg himself was very aware of the limitations of his atonal system, that is why, when using it, he composed very short pieces. His pupils, e.g. Alban Berg, were more audacious (Cf. Berg's Lulu).

      "sounds better": Western music sounds, it does not sound "better" or "worst". However, the effect it has on your psychical system is another matter.

      What those experimenters could have done is work with composers trained in the common practice, or read Schoenberg's Structural Functions of Harmony, or get acquainted with Schenkerian analysis.

  16. The Empress

    Pop music is more or less made by machines already. The human, of which there are 6 or 7 in the entire industry, just press a buttons on the Macbook to string the parts together. After all, with 9 words, 1 beat and 3 notes, it's not all that hard.

    1. Persona Bronze badge

      That basic concept goes back to 1792 and the dice waltz attributed (perhaps wrongly) to Mozart.

  17. Oh Homer
    Headmaster

    The reason

    The reason machines can't produce music (as opposed to just sound) is because it's an expression of emotion, which machines lack. Emotion requires empathy, which in turn requires sentience. So in order for machines to compose music that is meaningful, they'd need to know what they are, know what they like (and dislike), then care about it enough to have an emotional response to it, before translating that response into sound.

    Good luck with that.

  18. Andy 97

    And this is where we discuss the fact that silicone has no soul and cannot appreciate the lyrics of Bob Dylan, the funky guitar of Niles Rodgers, the choral magnificence of Bach and the emotion of Mendelssohn. As for those who expect an algorithm can build a complex dance tune, I refer you to the previous examples. We all know how they sound and how they make us feel. Mathematics can only mimic human perception from years of experience in real life and real experiences.

    Even buffoons such as Like Mike and Dmitri Vegas still have experience listening to soul, rock, disco, classical and will (maybe) have some experience of emotion (before their minds were destroyed by EDM).

    Blimey, sounds like something from Blade Runner.

  19. Jeremy Allison

    The Ultimate Melody

    I'm glad they're failing in a way. Imagine if they used ML to create "The Ulimate Melody", and broadcast it on the Internet...

    http://avalonlibrary.net/ebooks/Arthur%20C%20Clarke%20-%20Tales%20From%20The%20White%20Hart.pdf

    :-).

  20. JetSetJim Silver badge

    Aiva

    Surprised no mention made of this "ArtIst" Aiva

    It's in the classical domain, but the example at the end of the article seems a rather pleasant auditory treat. It's also apparently officially recognised as a proper composer, somehow.

    I also seem to recall reading that the most prodigious composer of music is a chap who has a computer do it for him - he now holds somewhere in the region 100k-200k musical copyrights from the computer generated music, but can't find his name this late in the day and can't attest to their quality..

  21. IamStillIan

    You could just use the Harrington 1200

    https://www.youtube.com/watch?v=rqkUISJej2o

  22. Anonymous Coward
    Anonymous Coward

    Moving the goalposts

    If it can 'evolve' from something like classical to jazz, or 50's rock n roll to punk, I will be truly impressed

  23. The Unexpected Bill
    Go

    This all seems like a great deal of overkill.

    Some years ago, Korg produced a keyboard synthesizer with what was known as the KARMA system. As I remember it, the idea behind this was that you'd play some or all of a song on the keyboard, and the KARMA system would either take it from there and continue playing or function as a "backing" band while you continued to play.

    I also think it's worth mentioning the Fake Music Generator web site: https://www.fakemusicgenerator.com/ . It bases on something known as cgMusic and produces interesting if rather repetitive songs.

  24. Anonymous Coward
    Anonymous Coward

    Music is a creative thing and fundamentally comes from the creative side of the brain. AI and algorithmic understanding is logical and from the other side of the human brain.

    There is no logical steps for creating music (or art) and certainly something that could be modelled in a mathematical algorithm (aside from fractals). The most AI could ever achieve would be to take a created song and identify loops (as per existing DJ equipment).

    Not surprised it didn't work.

  25. DropBear Silver badge
    WTF?

    Lots of people here seem to confuse catchy, original music with unremarkable, derivative tunes (of which even humans - even quite talented ones - produce a lot more than of the former) and emotionally charged personal interpretations that resonate with the listener with flat, strictly by-the-book playback (a master composer's work may be a lot less palatable played in a mechanic fashion but it is still distinctly original and a masterpiece).

    I believe machines are going to need a good approximation of the human experience to produce the former kinds (for which strong AI is only a prerequisite), but I see no reason why machines couldn't produce the latter kind which is nothing but variations and mash-ups on existing material - it would still need to sound acceptable in a non-random sense and that's still a non-trivial problem, but would require exactly zero "creativity" - coincidentally about as much as mediocre human output contains.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2019