back to article Aspera high speed file transfer: Let the cloud protocol wars begin

There is a problem with cloud storage that affects almost all of us, yet is something of which most of us remain blissfully unaware. The problem isn't the object stores underpinning cloud storage; used properly, object storage is great. Look instead to the bit shuffling data between end users and the cloud. It's not the …

  1. Anonymous Coward
    Anonymous Coward

    In two words: reliable multicast

    Cloudy replication, fail-over and transparent migration of workloads and associated data sets are bloody expensive to implement in terms of available network bandwidth. The standard answer of adding more bandwidth is not cheap and the underlying networking technology doesn't scale in line with CPU, RAM and disk storage. Demands for bandwidth looks set to outstrip our ability to provide it (at reasonable cost) for the foreseeable future.

    There is, in theory at least, a solution in sight: the use of rateless erasure codes over UDP multicast networks. Also known as fountain codes, these would allow data to be replicated to any number of servers with minimal (often around 3--5%, probabilistic) overheads. These systems improve over traditional forward error correction because they work for any network error rates (traditional FEC simply fails if the erasure rate is too high). These schemes make it much easier to eliminate or minimise the number of explicit ACK/NAK messages that need to be passed back to the sender, enabling them to scale out as well as scale up.

    Unfortunately, the technology has its problems:

    * patent encumbrance (some patents relating to LT, Raptor and fountain codes in general are overly broad)

    * no hardware support (though this could change, for now everything has to be done in software)

    * latency issues

    Latency seems to be the biggest practical problem. All of the codes I've studied (with one exception: "Perpetual" codes) suffer from having to receive almost the full stream before you can decode it. That decoding step also involves lots of random accesses over the full stream, so you end up doing somewhere in the order of 10 random disk reads/writes per received block before you can start working on the data. That sort of latency is a death knell for many applications.

    That said, I'm still optimistic that fountain codes (and their ilk) is, one day, going to be improved to the point that something based on it will become the de facto method of synchronising data sets across large groups of peer machines. It's already almost there when it comes to distributed data stores where low latency isn't so much of a requirement. It also achieves huge space savings over traditional replicated stores without sacrificing data integrity. If we could get the latency issue licked, we'd have pretty much the perfect system to build all our distributed protocols on.

    1. CaitlinBestler

      Re: In two words: reliable multicast

      The bottom line is that there is very little that can be done to optimize WAN transfers that cannot be done with an improved TCP congestion algorithm.

      Within a datacenter, or in-house corporate intranet, multicast can indeed be a verey useful solution.

      Not only do you get the obvious multiplier effect when replicating, multicast also enables dynamic load balancing when selecting targets.

  2. Anonymous Coward
    Anonymous Coward

    Errr - a Blatant Advert?

    Aspera is only one of a number of vendors that do this - surprised to see no mention of them (I don't work for either) - Filecatalyst and Signiant do exactly the same thing. And at the cloud provider I actually work for we do have clients with sustained 1Gps throughput - using one of these other vendors..

  3. Daggerchild Silver badge

    UDP? Uh oh.

    If UDP clients don't come with SOCKSv5 support, things get *really* fun. Hands up who had to drill holes in their security so Flash RTMP worked?

    So many 'service innovators' these days believe old problems might have old solutions. UDP is new again.

    Besides, everyone knows file transfer over DNS is where it's at.

    1. Daggerchild Silver badge
      Headmaster

      Re: UDP? Uh oh.

      s/days believe/days don't believe/ *grumble*

  4. Anonymous Coward
    Anonymous Coward

    File Transfer

    You may want to consider something called Digital Fountain. Now a part of Qualcomm as far as I can tell.

    http://blog.notdot.net/2012/01/Damn-Cool-Algorithms-Fountain-Codes

    https://en.wikipedia.org/wiki/Luby_transform_code See Chap 50

    http://www.cs.toronto.edu/~mackay/itprnn/book.pdf

    https://cseweb.ucsd.edu/classes/fa01/cse222/papers/byers-digital-fountain-sigcomm98.pdf

    http://blog.streamingmedia.com/2009/02/qualcomm-acquires-digital-fountain.html

  5. Tom Samplonius

    Oh no, the Register let Trevor write about the Internet again.

    "internet service providers are not going to build for peak traffic". And obviously this statement is false. I've worked at many ISPs, and they are designed for busy hour traffic. I'm not aware of a single ISP that doesn't. Residential ISPs using cable (DOCSIS) often do let their busy hour traffic go to 100%, but DOCSIS is not an appropriate technology for any business use, especially those using their connection for mission critical backups.

  6. jack01reacher

    nice article

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like