back to article Mathematicians spark debate with 13 GB proof for Erdős problem

When Pierre de Fermat famously complained that he didn't have space to write the proof of his famous “Fermat's Last Theorem”, he only ran out of space of the margin of a book. Now, a pair of mathematicians at the University of Liverpool in the UK have produced a 13GB proof that's sparked a debate about how to test it. The …

COMMENTS

This topic is closed for new posts.
  1. attoman

    Test? No problem

    The entire proof is reduced to 1 nm depressions on the head of a pin which is dipped in hemlock and inserted in each author of the proof. If they live the proof is true if not well you get the picture.

    A truly old English test is best.

    1. phuzz Silver badge
      Boffin

      Re: Test? No problem

      The head of a pin is approx 1mm x 1mm. Assuming each bit is 1nm wide one can fit 10^12 bits on the head of a pin, so theoretically you can fit about 116 GB of data on the head of a pin. Each bit would be about 7 atoms across.

      Your test is technically possible, carry on.

      1. BorkedAgain

        Re: Test? No problem

        The head of a pin is pretty blunt. I can only assume you've no plans to pierce the authors' skin. Which orifice are you inserting it into?

      2. Midnight

        Re: Test? No problem

        Yes, but how many of those atoms could dance on the head of that pin?

    2. Martin Budden Bronze badge

      Re: Test? No problem

      There are two authors. If one lives and the other experiences the alternative outcome, is the proof true or not?

      1. RaidOne

        Re: Test? No problem

        If one lives, he gets the pin once more. If he lives again, he got it out 2 out of 3, so the proof is false. Or the other way around.

  2. Ketlan
    Facepalm

    Yes indeedy

    “Imagine a random, infinite sequence of numbers containing nothing but +1s and -1s. Erdős was fascinated by the extent to which such sequences contain internal patterns. One way to measure that is to cut the infinite sequence off at a certain point, and then create finite sub-sequences within that part of the sequence, such as considering only every third number or every fourth. Adding up the numbers in a sub-sequence gives a figure called the discrepancy, which acts as a measure of the structure of the sub-sequence and in turn the infinite sequence, as compared with a uniform ideal.”

    I'm sure I had some paracetamol around here somewhere...

    1. Lee D Silver badge

      Re: Yes indeedy

      I'm a mathematician, but this is based on a quick reading of this article alone so may be complete rubbish:

      Imagine an endless list of random pluses and minuses.

      Take any section of that list, or say every other symbol, or whatever kind of pattern you like from it. This gives you another list of pluses or minuses that you've plucked from the original list.

      Work out whether you have more pluses than minuses in that or the other way around (or maybe an even number of both?). The difference is called the "discrepancy". A discrepancy of zero means there's the same number of pluses and minuses.

      Using your (carefully-chosen) shorter list, and the discrepancy, you could then tell whether, for example, most of the pluses are in the beginning of your original list, or whether your list alternates between pluses and minuses, or whether it has a long run of pluses followed by a short run of minuses or whatever pattern you're looking for, just by looking at your short list extracted by a certain clever pattern. You can tell things about the infinite list just by carefully choosing the rule you use to extract the shorter list.

      To translate the sentence: "For any sequence, Paul Erdős believed, you could find a finite sub-sequence that summed to a number bigger than any than you could choose – but he couldn't prove it."

      What I think he's saying is, you can always find a smaller list inside that infinite list that - if you choose it carefully - has a discrepancy (i.e. more pluses or minuses) bigger than the original infinite list. So you could always "fudge" the numbers by misrepresenting the larger list with a carefully-chosen pattern.

      But, to be honest, it's not entirely clear and probably a LOT more complicated than even the article makes out.

      And I'd be hard-pushed to come up with something practical out of it (though I'm sure there would be - this is the sort of maths that sits behind things like coding theory and, thus, sending messages, compression, error-correction, RAID, etc.)

      1. Werner McGoole

        Re: Yes indeedy

        My understanding is that you can't pick any old pattern. It has to be every number, or every second number, or every third, etc. But that's just from Wikipedia.

        1. h4rm0ny

          Re: Yes indeedy

          >>"My understanding is that you can't pick any old pattern. It has to be every number, or every second number, or every third, etc. But that's just from Wikipedia."

          I think that's the classic version of the problem and is good because it is illustrative. But in theory the problem should apply to ANY pattern you come up with, so long as that pattern does not depend on foreknowledge of the numbers. I mean however complex your criteria for selection, there ought to be a subset that has greater discrepancy than it in an infinite sequence. No?

        2. John Savard Silver badge

          Re: Yes indeedy

          I think what's in Wikipedia is more likely to be correct, as it makes enough sense that I can understand exactly what it means. And it seems like such a deceptively simple problem that it's surprising a proof, not of the theorem, but only that the discrepancy must be bigger than 2, not any number you care to name, takes 13 gigabytes.

      2. Frank_M

        Re: Yes indeedy

        Suppose my finite sequence is the first 10 numbers in the list, and I pick 11.

        Dammit, now I'm going to have to read this silly paper or I won't get any sleep.

      3. Paul Hovnanian Silver badge

        Re: Yes indeedy

        Perhaps I don't understand it either.

        Given some finite subsequence, there is an upper bound on its sum. That is, if the sequence selected happened to be a run of all '+', the sum will be equal to the length of that subsequence. It can't be bigger.

        Now, given the original infinite subsequence, its sum is determined by probability. The most probable sum is zero, but there is some probability that it is greater (or less). And given most probability distributions (that I am familiar with), they approach but never reach a probability of 0 at a sum of +/- infinity. So there is a non-zero probability that there is a sequence sum larger than any bounded value that I can think of.

        Or there's something about infinite series and probabilities that I'm missing.

      4. This post has been deleted by its author

      5. Anonymous Coward
        Anonymous Coward

        Re: Yes indeedy

        Reading the Wikipedia definition of the problem seems to limit the scope a bit. The idea is for any sequence, there should be some values d, k and C, such that the summation of every dth element in the list from the first k elements in the list would yield a number (discrepancy) greater than C.

        Neither of the definitions provide any limits on the type of infinite +1/-1 sequence, which would make the problem trivially false: with a sequence of all -1s, it would never be possible to create a positive discrepancy at all. Alternatively, a infinite sequence of "+1, -1" would never yield a discrepancy greater than 1.

        As such, it's likely that the problem is more concerned with random sequences or at least sequences with an infinite number of +1s and -1s. Then the problem starts to look like a simple application of the statistical law of large numbers. For which we would conclude that, yes, for every sequence containing an infinite number of +1s, it must be possible to select every dth element from the first k, such that any given value C could be exceeded. More over, it must also be possible to find such a string given a fixed d, because any randomly chosen subsequence of a random sequence is itself a random sequence. Then all that's left is to find a k (> d*C) that sums to C+1, which would be a simple linear search through the infinite sequence. Of course, having invoked the law of large numbers, this can't be considered a formal proof. Trying to come up with a mathematical proof feels like looking at the proofs for infinite series in general, they're interesting, but all kinda weird.

        Alternatively, Wikipedia and the New Scientist may just suck at explaining all the details of the problem.

        1. h4rm0ny

          Re: Yes indeedy

          >>Neither of the definitions provide any limits on the type of infinite +1/-1 sequence, which would make the problem trivially false: with a sequence of all -1s, it would never be possible to create a positive discrepancy at all."

          That's not an arbitrary pattern. That's a case of looking at the data first and then constructing a pattern to fit. The pattern must be independent of the random sequence. So you choose, e.g. every third number, and then apply it to a random string. Not study the string and construct a pattern.

      6. Michael Wojcik Silver badge

        Re: Yes indeedy

        And I'd be hard-pushed to come up with something practical out of it (though I'm sure there would be - this is the sort of maths that sits behind things like coding theory and, thus, sending messages, compression, error-correction, RAID, etc.)

        My understanding is that discrepancy theory has applications similar to Ramsay theory - it tells you how much structure will inevitably arise in a random distribution (well, it really puts a lower bound on the amount of such "accidental" structure), so it's useful for things like deciding what threshold constitutes a probable signal (rather than noise that looks like a signal) and that sort of thing.

        I've seen Ramsay theory applied to some machine-learning problems to help filter out false correlations, for example.

  3. seven of five

    "Imagine a random, infinite sequence of numbers containing nothing but +1s and -1s."

    Yes, I do that quite often. Usually when SWMBO starts one of her ... characteristics.

  4. P_0

    So the answer isn't 42 afterall?

    1. Britt Johnston

      The answer isn't 42?

      This is the sort of corollary that comes out of the full proof.

      For every infinite string it is possible to define a finite substring with a discrepancy of any size.

      If the discrepancy is 42, you know you have found a substring which fits the problems of the Adams universe.

      I'd suggest, if it isn't, and you still have a problem, that you try a substring of different size if the discrepancy is positive, and use a different pattern if it goes in the wrong direction or is near-zero.

    2. Martin Budden Bronze badge

      Of course the answer is 42!

      Are you sure the question is the right one?

  5. John Smith 19 Gold badge

    Been an issue ever since the 4 colour theorem proof came out.

    Computer generated so too big to test by hand.

    Yes as a puzzle it sounds a bit pants but "satisfiability" and +1/-1 are the sort of phrases that crop up in minimal digital logic design and custom logic layout for minimum surface area (and hence cost) on a chip.

    That has very practical applications.

  6. Ian Emery Silver badge

    Fortunes to be made.....

    .... publishing this in book form next Christmas.

    In the mean time, hire a few Chinese students to read through it, or send it to Private Eye for a review.

  7. Anonymous Coward 101

    As the complexity of proposed mathematical proofs increases, the likelihood of humans being able to fully understand it falls. The chances of mistakes in the proof that are not picked up by the reviewers increases.

    Thus, mathematical proof isn't absolute proof, but merely probable truth.

    1. Anonymous Coward
      Anonymous Coward

      Proof by demonstration and proof by deduction are very different things I think.

  8. Destroy All Monsters Silver badge
    Facepalm

    I thought we were past all that!

    For a writeup of the discussion of "Mathematics by computer!! MY GOODNESS, WHAT IS THIS WORLD COMING TO", see Peter Woit blogpost and pointers therein.

  9. Anonymous Coward
    Anonymous Coward

    maths

    You write a story about UK mathematicians and then have the nerve to put "math" (whatever that is) in the tag line. No merkin nonsense here please

    1. phuzz Silver badge
      Headmaster

      Re: maths

      Exactly, it's not called "Mathmatic".

  10. frank ly

    Proof by induction

    This is effectively 'proof by induction', where you say (sort of): if it's true for this, it must be true for that, and I've just shown it for this, so it must be true for all that. It works if you dig deeper into it. Proof by induction can get messy because you have to show (sometimes laboriously) that it is true for a certain case or cases.

  11. an it guy

    wikipedia download too big?

    Um. What about zipping it up. lots of +1 and -1 sounds like it should zip (or rar, or bzip, or etc.) fairly well.

    Or, does wikipedia allow .torrent files? What about wikileaks? Private data that's not being made public? sounds like a sort of fit...

    Ways and means exist.

    1. lurker

      Re: wikipedia download too big?

      Actually truly random binary data would not compress well at all.

      1. Anonymous Coward
        Anonymous Coward

        @ lurker

        "Actually truly random binary data would not compress well at all."

        But this isn't "truly random data". What happened is that *someone's* pr0n collection accidentally got put into the archive.

    2. DropBear Silver badge

      Re: wikipedia download too big?

      "lots of +1 and -1 sounds like it should zip (or rar, or bzip, or etc.) fairly well" - really not sure where you got the idea; if you think of +1/-1 in terms of good old binary 0/1, you should be well aware that while some of it zips quite well, others such sequences (like JPEGs, MP4s etc.) don't compress practically at all. It really depends on how random your sequence already is...

  12. h4rm0ny
    Paris Hilton

    Where did they get the numbers?

    Serious question - where did they get (how did they generate) this sequence of +1's and -1's ?

    I.e. assuming that their proof works by empirical testing rather than induction (which raises its own questions), how did they prove that the sequence they analysed was random.

    If this little bit of coverage reaches any of the people involved, I'd love a quick reply clarifying.

    (icon because that's me right now)

    1. h4rm0ny
      WTF?

      Re: Where did they get the numbers?

      Wow - a downvote for a maths question? Some people really take discussions here personally, don't they? Someone who didn't like my comments elsewhere, presumably, or they really hate Paris Hilton. :D

      1. teebie

        Re: Where did they get the numbers?

        Empirical testing? As in run a test on a random selection of data and assume if it gives a decent p value then the hypothesis can be considered correct?

        That sort of thing doesn't cut it as mathemetical proof.

        It might be good enough for sciences less pure than maths (e.g. sociology, physics, all other sciences), but even a a statistical proof - the probability of this being wrong is 0 - isn't enough for a mathematician.

        1. h4rm0ny

          Re: Where did they get the numbers?

          >>"Empirical testing? As in run a test on a random selection of data and assume if it gives a decent p value then the hypothesis can be considered correct? That sort of thing doesn't cut it as mathemetical proof."

          Well I agree. But look up what a SAT attack is. Unless I've misunderstood the article, they ran a SAT algorithm solver on a sufficiently large data set and "proved" it statistically. Hence my question about the randomness of the source data. Even if one accepts a proof by demonstration rather than induction (which you do not and I am reluctant to do), that remains a potentially serious flaw in the proof.

          I am not a mathematician, hence my asking the question.

    2. John Savard Silver badge

      Re: Where did they get the numbers?

      That's not that serious a question, unless you haven't read the article. The theorem isn't about one particular sequence of -1s and +1s, it's about any and all possible such sequences. No matter how hard you try, you shouldn't be able to construct a series so cannily arranged that, if you're allowed to choose between the absolute values of the partial sums of the series itself, every 2nd, every 3rd, every 4th, and so on, you can't make that result as large as you wish.

  13. Anonymous Coward
    Anonymous Coward

    I smell snake oil .....

    When the lottery started, I knew a guy who was convinced he could predict the numbers after a few draws. He created a speadsheet with a list of draws lowest ball to highest. He then worked out the average for each column. He then worked out the deviation from the average for each ball, and then worked out an average of that. He then used those derived averages to generate a list of numbers which he then played.

    I wonder what became of him ....

    1. TheWeenie

      Re: I smell snake oil .....

      As we've not read about an individual winning the lottery five times over, I'd say he's probably still playing with his spreadsheet!

      1. h4rm0ny

        Re: I smell snake oil .....

        Nah, he's successfully worked out that the probability of his winning the lottery is a near certainty if he buys 5x10^12 lottery cards and is now just saving up to buy them.

    2. Ben Bonsall

      Re: I smell snake oil .....

      He's working for the NSA, predicting your next SSL key

  14. Steven Jones

    Not a new problem...

    Surely this is hardly new. When Kenneth Appel and Wolfgang Haken proved the four colour theorem in 1976, it was originally not accepted by mathematicians as the computer-assisted proof was far to large to be proof-read by human beings. it gained a sort of grudging acceptance.

    However, I always thought the objection was flawed in principle. Firstly, it's possible to consider a computer program (and the hardware on which it runs) as a mathematical entity which, itself, is subject to inspection and being proof-read. There have even been attempts to produce mathematically provable programs (although that is only possible in principle for some sorts of programs) and, even processors - anybody remember VIPER? Of course, any proof-reading of a program, especially one which cannot be mathematically verified, is open to human error. However, that is also true of human proof-reading of mathematical proofs. The possibility is always there of a human error, especially for very complex proofs.

    It's therefore true, in principle, that the body of mathematics is open to human errors, albeit not on the scale of complex computer programs. It is therefore a problem of scale, and not principle.

  15. Crisp Silver badge

    Imagine a random, infinite sequence of numbers containing nothing but +1s and -1s.

    Somewhere within that sequence is a subset of numbers that contains a picture of a cat that wants a cheeseburger.

    1. Brewster's Angle Grinder Silver badge

      Re: Imagine a random, infinite sequence of numbers containing nothing but +1s and -1s.

      Unfortunately, it only occurs "at infinity".

    2. jphb

      Re: Imagine a random, infinite sequence of numbers containing nothing but +1s and -1s.

      And somewhere in that sequence is every other picture you could possibly imagine and, indeed, every other finite sequence you could imagine including arbitrary long sequences of 1's and 0's, the complete text of Wikipedia, the answer to the ultimate problem of life, the universe and everything etc., etc. However I wonder whether you could specify a sub-sequence in the sense of the proof by specifying its content or properties rather than its position? Such content addressability is an everyday aspect of computing, I wonder how this would affect the proof and/or how such a strategy would be excluded?

      1. Michael Wojcik Silver badge

        Re: Imagine a random, infinite sequence of numbers containing nothing but +1s and -1s.

        And somewhere in that sequence is every other picture you could possibly imagine and, indeed, every other finite sequence you could imagine including arbitrary long sequences of 1's and 0's

        This is a popular claim, but it's incorrect, or more precisely the conditions you've specified are not sufficient to make it correct.

        Consider the infinite sequence defined by the following:

        1. A random finite number > 1 of +1s

        2. A -1

        3. Go to 1

        This is infinite and random: if you are looking at one element of the sequence, and it is a +1, you cannot predict the next element; and if it's a -1, you cannot predict the element after the next. But it never contains the subsequence {-1,-1} (or any longer subsequence of -1s).

        "Random and infinite sequence" does not imply "contains every possible subsequence".

        Now, insofar as this sequence has some (non-accidental) structure - subsequences of -1s are restricted in length - it is not purely random. But no one in this thread said "purely random". Further, no finite subsequence of this not-purely-random sequence can be proven to not be a subsequence of a purely-random infinite sequence: you can't tell by inspection that it has non-accidental structure, only by definition.

  16. This post has been deleted by its author

  17. Tom 7 Silver badge

    Empirical proofs are the best.

    Especially when Alexi Sayle proved to Jean Paul Sartre that he did really exist by head butting him in the throat.

    1. Destroy All Monsters Silver badge
      Headmaster

      Re: Empirical proofs are the best.

      I don't think that JPS ever was in doubt about that.

      "Existentialism" is not about whether you exist or not.

    2. JimmyPage Silver badge

      Jean Paul Sartre

      As a thinker there was no one keener, except he knew fuck all about the Cortina.

  18. Moosh
    Paris Hilton

    Good Grief

    Let us imagine just how MASSIVE the proof must be, were you to attempt to open it in a document reader.

    There are ~4,451,363 articles on Wikipedia as of this moment (of googling). This proof is 3GB larger than that. I'd say another 1.35 million pages or so.

    I'd rather read Proust 1813 times.

    There are over 2.6 billion words on wikipedia at the moment.

    The Average reading speed of an American adult is 250 - 300 WPM. Lets be generous and say 300 WPM. We will also assume they understand the theory better than any of us.

    Lets say that this person is super human and reads the proof at a constant 300 WPM 24/7.

    It would take him 16.53 years to read through every single word on Wikipedia. Going full throttle. every single minute of every single day for 16 and a half years. And this theorem is bigger

    1. Ken Hagan Gold badge

      Re: Good Grief

      And yet it is still smaller than a clean installation of Windows.

  19. Justin Stringfellow
    Coat

    Try fitting THAT in the margins, math wonks

    Actually, assuming a A4 sheet with a 1" margin, and a areal density of 500Gbpsi, you can fit 730GB into the margin.

    1. Michael Wojcik Silver badge

      Re: Try fitting THAT in the margins, math wonks

      I'll just fit the prover program and its initial state into the margin, since it implies the proof.

  20. itsallcrap

    This is the way it's going to go, isn't it?

    Affordable computers are now getting to have the sort of power where writing a program to try things exhaustively is easier than trying to derive a proof analytically.

    So who needs the actual proof any more? Makes more sense to peer-review the Python script that generated it...

    1. Destroy All Monsters Silver badge
      Thumb Down

      Re: This is the way it's going to go, isn't it?

      "easier than trying to derive a proof analytically"

      I think you are confused about how this whole proof business actually works. The "analytically" adjective doesn't fit in there either.

      1. h4rm0ny

        Re: This is the way it's going to go, isn't it?

        Actually, in this instance they are not confused. The mathematicians ran a SAT algorithm solver. Essentially "proving" it statistically. This is not a proof by induction. That's why the proof is so large.

        (n.b. I think by analytically, they meant a proof by deduction).

  21. TonyK

    You got it wrong, vultures

    The announced result is a proof that the discrepancy is at least 3. Even New Scientist got that right, so you really have no excuse.

    1. John Savard Silver badge

      Re: You got it wrong, vultures

      Wikipedia said it was 2, so even they got it wrong?

      1. TonyK

        Re: You got it wrong, vultures

        Wikipedia says it's "larger than 2", John. Which makes it at least 3.

    2. Michael Wojcik Silver badge

      Re: You got it wrong, vultures

      I didn't see anything in the article about specific values of C. Did I miss something, or was the article updated?

      Technically, the way the Erdõs Discrepancy Problem is phrased, at least according to the 'pedia, this work constitutes a proof for C=2.

      The problem is "for any ±1 sequence, there exists k, d, C such that |Σi=1kxi*d| > C. So contrary to the informal explanations in previous posts, it's "take a subsequence starting from the beginning of the sequence, and then take every dth element of that, and add them, and take the absolute value, and compare it to C".

      (Apologies for the formatting. Reg, can we have LaTeX? MathML? Probably some time after we get proper support for the HTML PRE element, I guess.)

      The shorter version: you start at the beginning and add every Nth element, and eventually you'll get arbitrarily far from zero, if you pick N correctly.

      In any case, though, it appears this latest work is a proof for C=2, since the problem says "greater than C".

  22. The Vociferous Time Waster

    But...

    Tl;dr

    But does it run Linux?

  23. Eclectic Man

    Computers and Proofs

    I think it was the English mathematician, Keith Devlin (fomerly of Lancaster Uni and now, I think, of Stanford) who pointed out that there is as likely to be a mistake in a 2 million line computer program as there is in a 2 million line (human created) mathermatical proof.

    The computer-based proof of the 4 colour map theorem works because it exhaustively demonstrated that all maps with certain features could be coloured using at most four colours, and that the list of those certain features was complete for planar maps.

    The assertion of a proof for the Erdos discrepancy problem seems to require somthing different, as clearly not all (or indeed any) infinte random binary sequence can have been fully analysed. Seems like I'm going to have to read the original papers. Although if Erdos couldn't prove it, it is going to be tricky.

    1. Michael Wojcik Silver badge

      Re: Computers and Proofs

      I think it was the English mathematician, Keith Devlin (fomerly of Lancaster Uni and now, I think, of Stanford) who pointed out that there is as likely to be a mistake in a 2 million line computer program as there is in a 2 million line (human created) mathermatical proof.

      Does he have a proof of this? Is "as likely" meant to be a precise statement of probability? Because that looks unlikely to me, for three reasons:

      1. Programming languages are heavily constrained by their syntactic and semantic rules - more so even than the formal prose of non-symbolic lines of human-written mathematical proofs. That means lower entropy per line, and thus a lower probability of an error (in a program that is successfully interpreted by the machine). This is equivalent to saying that a machine-readable proof will be longer than a human-readable proof of the same information entropy. (We probably could use Kolmogorov complexity theory and Chaitan's algorithmic information theory to set some bounds on that, but I'm only posing it as an objection.)

      2. Along similar lines: the machine implicitly checks a vast number of correlated values in the process of executing the proving program (by not wandering into undefined behavior and failing to produce a coherent result). This redundancy reduces the information entropy of the transformation of the static program into its dynamic execution, and so further reduces the probability of error.

      3. Human-constructed proofs are rarely entirely symbolic (the programs of Whitehead & Russell, Hilbert, etc notwithstanding), and so ask the interpretant to engage in rational deliberation of claims. That is the source par excellence of mathematical innovation, but it is also fraught with error; indeed, that is its strength.

      So I suspect an N-line1 computer program, if it executes successfully, in fact has a probability of error that is only a fairly small fraction of the probability of an N-line human-written proof, once N gets sufficiently large. Of course, that's still plenty of room for error. But also of course, the probability of error can be further reduced by many other mechanical techniques, and it's unclear (and per AIT probably unprovable) how far down that process can go.

      1It might further be objected that source-lines-of-code is a meaningless metric for the complexity of a program anyway. But by Kolmogorov equivalence, N lines in any one Turing-complete programming language is equivalent to N*C lines in any other, where C is a constant and usually relatively small. So SLOC is good enough for this line of argument.

  24. Frumious Bandersnatch Silver badge

    rephrased?

    If you look at the +1 and -1 values as steps in a random walk over the number line, then there must be some subsequence of steps that moves you arbitrarily far from the origin. It seems intuitively obvious: on the one hand, it's kind of like an infinite monkey argument or, say, that if you take all the digits of pi in some number base and squint at them in the right way (cherry picking some subsequence), you'll find some pattern that looks meaningful. Actually, Pi, the film uses this as its premise, now that I think of it.

    The other way it seems intuitively obvious is that if the sequence is actually random and you start slicing it up, you're bound to be able to find some sequence that has lower entropy than expected. It shouldn't matter, though, because the entropy of the entire sequence is fixed (so using Kolmogorov complexity to define the entropy, the fact that there are some signs of structure, it doesn't mean that this negentropy is useful for compressing the entire string). That's the way I see it, anyway. I just don't know what the point is of looking for structure in unstructured (random) strings.

    1. h4rm0ny

      Re: rephrased?

      It's not about whether structure exists. It's whether for any arbitrary pattern you can come up with (i.e. based not on the pattern of numbers itself), a subset exists with greater discrepancy. We know this intuitively, but how can we prove it?

  25. teebie

    nine gigabytes = 13 GB

    If you hide the corrections link then pedants will put corrections(*) in the comments.

    (*) If I have screwed up please read 'corrections' as 'requests for clarification'

    1. Anomalous Cowturd
      Facepalm

      @teebie Re: Corrections.

      It's up near the top of this page, to the right of the "Post your comment" grey box.

      They moved it a while ago, during the last re-vamp I believe.

      HTH.

  26. Stevie Silver badge

    Bah!

    Two things:

    a) Fermat was pulling the old "hence it is obvious from the diagram" stunt that was roundly rejected as sound reasoning or excuse-making by my Applied Maths teacher in '73.

    2) It is *NEVER* appropriate to point to a Wikipedia page to dodge explaining something mathematical to the non-hard-sums-numpties out there. If there was one field of study in which the Wikipedia is solidly demonstrated to be an encyclopedia for anyone BUT the people it is mathematics. Opening paragraphs reject the Britannica's technique of beginning with a simple explanation and then working up to a full-blown exposition in sumsspeek in favor of a wall of sumsblither with links here, there and everywhere (often running in big, unhelpful circles) which only compound the gibberish and waste an afternoon to no good effect.

    The math wonks who put together the Wikipedia math pages should take an afternoon to follow the linguistics pages and, chastened by yet another fine example of blither over helpfulness, return and recompose every damned page.

    1. Frumious Bandersnatch Silver badge

      Re: Bah!

      a) Fermat was pulling the old "hence it is obvious from the diagram" stunt that was roundly rejected as sound reasoning or excuse-making by my Applied Maths teacher in '73.

      Oh, I don't know. I think that Pythagoras' theorem is self-evident from this diagram

      2 [sic]) It is *NEVER* appropriate to point to a Wikipedia page to dodge explaining something mathematical to the non-hard-sums-numpties out there

      Oops. I appear to have linked to a wikipedia page. Oh noes!

      In all fairness, though, I do agree with you that Wikipedia pages on maths are generally bad. They're definitely failing in their attempts to explain most concepts to the general reader, and often even to those who aren't frightened by and/or have a modest level of maths ability.

      1. Ken Hagan Gold badge

        Re: Bah!

        "Oh, I don't know. I think that Pythagoras' theorem is self-evident from this diagram"

        Well it looks plausible for the values of a, b and c used in the diagram, but I'd need to print it out and measure it up with a ruler to make sure the artist hadn't cheated.

  27. DJO Silver badge
    Boffin

    What's Random Doc?

    Be interested to know how they generated random numbers because the random number generators used by computers are not really random at all, it would be better to call them "sequence generators" as with the same starting configurations they will generate the same numbers.

    The only truly random events are at the quantum level such as radioactive decay which can be tricky to incorporate into a PC, radio noise (cosmic background radiation) can be utilised to provide an adequate source of random numbers but the pseudo-random number generators in normal PCs are not random enough for this sort of thing.

  28. David_H

    Proof

    Err... "random sequence of 1 and -1 must have an offset" is the laymans simplification.

    Emperical evidence is looking at a badly wired Ethernet link (i.e. one that is truly floating). Here the +1 and -1 charges should cancel if they are equal in number. (A day of general Ethernet traffic could be considered as a random sequence.) But you will find that the charge will float in one direction or the other over time, indicating a bias for +1 or -1 in the signal.

    Or is this just too much of an over simplification?

    1. Michael Wojcik Silver badge

      Re: Proof

      Err... "random sequence of 1 and -1 must have an offset" is the laymans simplification.

      Too simple, as that's neither a restatement of the problem, nor true. But it's looking in the right direction.

      In the EDC:

      1. The sequence must be infinite, and need not be random.

      2. The sequence as a whole need not have a bias from zero.

      3. The EDC hypothesizes that you pick a subsequence starting from the beginning, and summing every Nth element in that range; and then it says if you pick N correctly and take a long-enough subset, you can get arbitrarily far from zero.

      So the EDC conjecture doesn't just say there's a bias; it says that if you take a large enough subsequence (starting at the beginning) and choose the correct stride, there's an arbitrarily large bias.

      This latest work claims to prove that this is always true for a bias of up to 3. No one's proven yet whether it's true for larger values, much less proven the general case.

  29. loneranger

    So glad I'm not a mathematician

    I've taken Calculus through Calc 2 plus Linear Algebra and Theory of Computation, and I have to tell you that I'm glad that someone else likes to do nothing but sit around all day and imagine new problems to solve in mathematics, because I personally never want to punish myself with exposure to it again. You have to be a special kind of crazy to like it.

    We need people who can do it well, so more power to them.

  30. Fruit and Nutcase Silver badge

    10GB

    "A complete Wikipedia download is only 10 GB."

    For comparison, what is the size of the El Reg commentard archive?

    1. h4rm0ny

      Re: 10GB

      >>"For comparison, what is the size of the El Reg commentard archive?"

      It is also 10GB. But in this case, 5GB of that are the collected downvotes for Eadon.

      1. Francis Boyle Silver badge

        Re: 10GB

        Nice try, but downvotes for Eadon are the definition of perfect compressibility.

        1. h4rm0ny
          Joke

          Re: 10GB

          >>"Nice try, but downvotes for Eadon are the definition of perfect compressibility."

          That is compressed. It's a 5GB number. ;)

          1. Francis Boyle Silver badge

            Touche

            Can't top that.

  31. John Savard Silver badge

    Counterexample

    Let's say you're trying to construct a series that would serve as a counterexample.

    +1 -1 +1 -1 ...

    stays within the limit of 1 for any subsequence with an odd increment. But obviously it goes to infinity if your step size is 2.

    So try

    +1 -1 -1 +1 +1 -1 -1 +1 ...

    That works well for 2, and for odd step sizes. But with a step size of 4, it goes to infinity. Hmm...

    Next try would be something that might have a chance of working for any power of 2, however large. So try a sequence of +1 and -1 that has a pattern based on the Gray code.

    That, of course, doesn't work either, which is the point at which one might start seriously thinking that Erdos was right.

    1. Michael Wojcik Silver badge

      Re: Counterexample

      Yes, that's the fun1 of recreational2 discrepancy theory - playing with constructed sequences to see how hard it is to extract certain kinds of structure.

      1Your fun may vary. No guarantee of entertainment is expressed or implied.

      2See note 1.

  32. Marshalltown

    Chat

    They probably should have just had a sit down and chat with the ghost of Georg Cantor.

  33. Anonymous Coward
    Anonymous Coward

    You could read the actual paper

    The ArXiv link is in the article.

  34. hapticz

    ???

    using a fixed 'pattern' to sample the bulk with, seems counter-intuitive as a method to prove this. as any proof that the bulk data is truly random is next to impossible in the first place. rather,does this exercise attempt to extract a 'proof of randomness' of the original bulk stream via this method?

  35. goodfox007

    Surveys can prove anything

    In my opinion , this proves that surveys containing yes(+1) and no(-1) type of questions for a finite number of people(which is always the case), CAN PROVE ANYTHING. so, conclusion: DON'T TRUST SURVEYS.

  36. Dr Patrick J R Harkin

    I have discovered a truly marvellous proof of this,..

    ...which the margin of this internet is too narrow to contain.

  37. Mussie (Ed)

    Speaking in stoopid

    Just to put this in terms i fully understand the question is:

    If we have an infinte number of pulses

    Each pulse randomly + or -

    And we had the ability to calculate the net result (of an infinte number)

    We would have a significant number of either + or - at the end

    Is that what we are trying to dettermine here ?

This topic is closed for new posts.

Biting the hand that feeds IT © 1998–2019