back to article Docker and storage – solving the problem of data persistence

In June I was in in Seattle, the home of Starbucks, Boeing and Microsoft, for DockerCon 2016. Compared to the normal events I attend, this one promises to be a more “casual” affair, so I sported polo shirts and vendor hoodies as my standard attire. One of the more interesting problems yet to be fully solved with container …

  1. Paul Smith

    I never quite got containers...

    I never quite got containers and this article has brought my problem nicely into focus. I was brought up with the idea that the whole (and only) point of software was to take some data from one place, transform it in a way that adds value and put it somewhere else. Containers could never do that which explains why I never got the point of them.

    1. Platypus

      Re: I never quite got containers...

      Containers are pretty useful, but the idea that they should all be stateless has always been STUPID. Any non-trivial application has state that has to be stored somewhere. Making it "somebody else's problem" only creates a new problem of how to coordinate between the containers and whatever kind of persistent storage you're using. If one provisioning system with one view is responsible for both, subject to the constraint that the actual disks etc. physically exist in one place, then it actually does simplify quite a bit of code.

  2. batfastad

    Containers

    For me containers are for serving code, not for data. If you end up running database or document/file/object store instances in containers then you're doing it wrong. I still believe it's easier to have your data in VMs instead of containers.

    We have several applications that are pretty scalable running on ephemeral AWS nodes, created when that was the only option, and it's so much simpler operationally. Data backends are not as elastic as frontends, true, though we try to use object/file stores when possible so the scaling is not our problem. Patching and application upgrades have always been a case of just blowing away the VMs and deploying new, so you phase rollout. You also avoid all that legacy and cruft that people dump into directories and never clear up.

    I don't see the advantage of running 5x database containers in a VM vs 5x database VMs. Though if someone can explain that to me then happy to reconsider.

    1. Anonymous Coward
      Anonymous Coward

      Re: Containers

      The advantage of having a container for the db engine (vs the date volume) is that the whole provisioning amounts to "docker run -v db:<whatever> posrgres: 8.57" or whatever version you want. I'm not sure if having the DB storage in external volumes is "best practice," but it works for me.

      1. batfastad

        Re: Containers

        But where's your postgres data? And how does a 64GB container compare to a 64GB VM?

        1. JMcL

          Re: Containers

          What works for me is to put the data onto an OpenStack data volume which can be attached to an instance, have the DB engine (Postgres, Mongo, whatever you're having yourself) run in a container on the instance and mount the data directory from the volume as a container volume. This gives me the DB engine container which can be freely recreated at any time, the instance VM is also relatively transient as it can be easily recreated and the volume containing the data reattached to the new instance.

          This is on OpenStack which just happens to be the environment I'm using, but similar approaches seem to be applicable on AWS, Azure etc.

  3. Black Road Dude

    Scalability is key

    I think the persistence problem has already been solved.

    Containers mean it's simple to scale an application be it the front end or the back end and the database element is no different. If you are using a clustered DB its easy to kill of a node or two and still have a fully replicated version of the db on the other nodes. Whole server instances can be ephemeral as long as your DB cluster is large enough.

    The bit that does not fit the container paradigm are standalone dbs as by their very nature cannot be easily destroyed meaning they do become pets again. Examples to use rather than traditional state storage techs such as mysql are mongo, Cassandra, rabbitmq, hadoop, hdfs or kafka if your still needing rdbms try crate.

    I'm not saying this works for every use case but I'm yet to find one that it could not help with or make simpler to deploy (even if it means more setup and config on your build/ci boxes to get there)

    Also is anyone thinking about testing here. That's the real reason I started using docker in the first place. Being able to spin up an entire application stack including database network and configuration and run tests against that (where the testes images are the exact ones that would be deployed) on every build was a bit of a pipe dream before docker. (For us anyway)

    Obviously it's just a tool and people can use it in a million different ways but for us its been an enabler to more tested code, configuration management and continuous delivery.

  4. boatsman

    the perpetual problem: how do we keep data + meta-data available, and separate from the code

    that was an issue in 1960, and it still is. mainframes, mini's, pc's, vm's, containers, serverless code, the cloud:

    all presented as the philosopher's stone that was going to solve all your problems.

    all to good to be true, and that's exactly what they all are.

    all ignoring the same issue: how do we keep the data and meta data available, even when

    the engine blows a fuse... or even blows up....

  5. Paul Smith

    I get it

    I get the deploy benefits, but in my experience, deploying a solution is not the solution, it is merely one aspect of solution delivery, a thorny one admittedly, but no customer has ever paid me to for the deployment. I get the scaling of the application, the ability to just run up additional instances on demand and drop them when they are no longer needed, which leads directly to what makes containerization so good for testing, the ability to blow away what was there and start again clean. However, this is exactly what makes containerization a problem for me; If you are blowing away an instance when you no longer need it, what exactly was the point of doing it in the first place? Stateless activities are (IMHO) meaningless unless they take place within a statefull context, and I don't think Docker et. al. have grasped that.

  6. koensayr

    What about EMC {code}'s RexRay?

    This is a well written piece about about the issues around persistent applications and containers. We often forget that while we'd all like our applications to be stateless, 7 of the top 12 Docker containers on Docker Hub all require some form of storage. However, I was disappointed to see that our project called RexRay was omitted from mention from the article. Its been around for over a year now, and aims to provide a solution for running containerized applications that require storage. Please take a look at github.com/emccode/rexray

    1. regreeder

      Re: What about EMC {code}'s RexRay?

      > Storage Provider Support

      > The following storage providers and platforms are supported by REX-Ray.

      > Provider Storage Platform(s)

      > EMC ScaleIO, Isilon

      > Oracle VirtualBox Virtual Media

      Oh dear. Hardly a solved problem then...

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like