back to article Revealed: Inside super-soaraway Pinterest's virtual data centre

It's every startup's dream: to be growing faster than Facebook without having to build a Facebook-sized server farm. Pinterest is an online picture pinboard for organising your favourite snaps and sharing them. It was founded by Ben Silbermann, Paul Sciarra, and Evan Sharp in March 2010, and it's growing like crazy with just …

COMMENTS

This topic is closed for new posts.
  1. Anonymous Coward
    Anonymous Coward

    Pin this

    Looks like the only thing Pinterest has done of any interest is be a case study for AWS.

  2. apleszko

    2 billion dollars deal?

    Probably the next Facebook target to be acquired... The price should be higher if they don't have any revenue at all...

    1. Heff
      Joke

      Re: 2 billion dollars deal?

      or Yahoo! AMIRITE. <3

  3. Alister
    WTF?

    what on earth...

    ...is a sharded database?

    1. Kevin McMurtrie Silver badge

      Re: what on earth...

      It's distributing rows of a standard relational database across many databases. You still have relational data but the relationships have a limited spans that always fit within one database. It's a solution when you have very complex customer data requiring transactions but at the same time have little customer-to-customer interaction.

      1. tpm (Written by Reg staff)

        Re: Re: what on earth...

        In the old days before these Web 2.5 guys took over the world, I guess it was called database partitioning.

        1. streaky
          Boffin

          Re: what on earth...

          Vertical/horizontal partitioning. But yes.

      2. Don Jefe
        Thumb Up

        Re: what on earth...

        Kevin's description of a 'sharded' database is possibly the most efficient I've ever heard. He took a stupid term (sharded) and still managed to make it make sense!

    2. Alister
      Thumb Up

      Re: what on earth...

      Thanks for the replies, never heard the term before.

      1. AndrueC Silver badge
        Trollface

        Re: what on earth...

        > Thanks for the replies, never heard the term before.

        The cloud in a nutshell.

        1. Aaron Em

          It's web scale.

  4. Anonymous Coward
    Anonymous Coward

    While this is a fascinating insight into the modern planning of a server farm, what exactly is the point of Pinterest?

    I've just had a look and you're right, cupcakes, pets, food and a few random celebs.

    Then again I don't understand the point of Twitter either, perhaps it's because I'm an introvert.

    1. Anonymous Coward
      Anonymous Coward

      The wives/daughters of the neighborhood caught the bug a few months back

      I guess... from seeing them use it and hearing them talk about it... well, damn, there's not really a "guy" explanation for it that I can think of. Maybe I could explain this way:

      Have you ever seen a teenage girl cut-out pictures of dresses/outfits/whatever from a magazine and put them on a pin-board in her room as "inspiration" (I guess) of some sort? It's like that... I think.

      The website allows them to do this virtually, they can follow other people's boards, and - I think - aggregate pinnings allows them to present a "what's hot" feed.

    2. Ru

      You just don't understand web2.x

      I must confess I've no idea what the current point release of the "lets release products with no business model and no plans to ever generate revenue beyond advertising" school of business is, but it seems to work... witness Instagram.

      Trying to turn a profit from something that contains so much redistributed copyright material though? Bit of a legal minefield there. I assume Google managed it with Youtube, so maybe it isn't impossible.

      1. streaky
        Black Helicopters

        Re: You just don't understand web2.x

        No it's why there's gonna be an almighty crash.

        Just waiting for normal people to get heavily involved a la 2000 (see: Facebook) and iit's all going down in flames, then we'll have another 5 years of don't invest in tech, ooh no bad, don't do that.

        When the point is some of these companies barely have revenues let alone profits. They're just money burning machines. Much as I love it some day the Twitter investment rounds will dry up because somebody will finally ask when they're getting their money back (it's effectively a pyramid scheme at this point).

      2. Ian Michael Gumby
        Devil

        @Ru Re: You just don't understand web2.x

        I don't think that there is any real copyright issues.

        You have fair use, ToS, and DCMA to shield the company.

        As to doing a startup w no b plan or way to make a profit?... It makes some sense in a perverse way.

        You have minimal infrastructure costs and you learn from your mistakes. Consider the venture a petri dish...

    3. MikeSM
      Happy

      I too am an introvert fwiw, and yes it seems Pinterest is nothing more than a problem looking for a solution.

      Just very recently however I have begun to find Twitter quite useful. I prefer to keep my Facebook very personal/professional and not cluttered with a bunch of Likes and subscriptions. Personal friends and professional contacts only. This allows me to reveal very little to the platform/advertisers about myself. Now with Twitter I can follow feeds I find interesting like the local council, comedians I enjoy, software developers, political figures, etc. Because I've never shared any personally identifiable information with Twitter it limits the ability for third parties to build profiles about me. And I can keep my interests private from my professional contacts on Facebook.

      Also I should add that I have never actually tweeted anything myself.

    4. This post has been deleted by its author

    5. Aaron Em

      The point of Pinterest is to make money for the people who built it.

  5. Sporkinum

    Money?

    These things mess with my mind. They get a bunch of guys to put up a ton of money based on speculation, with no source of income other than investors. Cool use of AWS though!

    $500,000 1 Jan 2010

    Jack Abraham, Michael Birch, Scott Belsky, Shana Fisher, Kevin Hartz, Jeremy Stoppelman, Brian Cohen, Fritz Lanman, Hank Vigil, FirstMark Capital.

    $10,000,000 7 May 2011

    Bessemer Venture Partners, Kevin Hartz, Max Levchin, Jack Abraham, Michael Birch, Ron Conway,

    FirstMark Capital Financial Organization.

    $27,000,000 7 Oct 2011

    Andreessen Horowitz, Bessemer Venture Partners, FirstMark Capital, Bessemer Venture Partners.

  6. Anonymous Coward
    Anonymous Coward

    Serious copyright issues

    http://mansurovs.com/pinterest-copyright-infringement-made-cool

    Many of these concerns from photographers could be solved though. Pinterest could implement an image “fingerprinting” scheme like Youtube does with video and sound. Once an image is taken down, nobody could pin the same image again even if it was resized or cropped.

    Possibly allow photographers to preemptively enter their photos’ fingerprints in the database before infringement occurs. Make a desktop or web application. Have it resize large photos before uploading to server. Server processes the photos and keeps their fingerprint in its database. Server throws away the photos and never uses them afterwards.

    Set the system up not to be extremely strict on matches and they’d still stop most of them. They could have a human review false positives before they showed on the site. It’s cheap to hire labor overseas to do tasks like that.

    1. Ian Michael Gumby
      Devil

      Re: Serious copyright issues

      You have a couple of wrinkles...

      1 fair use doctrine

      2 DCMA puts the burden on the rights holder

      3 non commercial use of the photo,

      4 the jury is still out, wait until the you tube lawsuits are out.

      5 no profits to go after until FB buys them for 2 Billion...

  7. Anonymous Coward
    Anonymous Coward

    And how much does 410TB of storage on S3 cost?

    Just over $40,000 per month I reckon. Plus the cost of the bandwidth when people download the files.

    The same amount or not much more will buy you 410 x 3TB drives (so you can store 3 copies of everything for redundancy), and it's a one-off cost.

    Admittedly they are getting the management of this taken care of - there's someone in the data centre going round swapping popped hard drives - but $40,000 per month buys you a lot of rack space plus quite a few staff. Basically, if Pinterest takes off to anything like the level of Facebook, they will have to ditch EC2/S3 pretty sharpish.

    Google "backblaze petabytes on a budget" for comparable figures from a storage-hungry but profitable business.

    1. Morg

      Re: And how much does 410TB of storage on S3 cost?

      Indeed that cloud sucks, but I think it's more about not giving a shit about infra and optimization since they just want to sell a concept.

    2. Anonymous Coward
      Anonymous Coward

      Re: And how much does 410TB of storage on S3 cost?

      "The same amount or not much more will buy you 410 x 3TB drives (so you can store 3 copies of everything for redundancy), and it's a one-off cost."

      Yes, but then you have to have something to put those drives in, so you have to buy a storage array. That's another "one-off" cost. Also, we'll assume you want some RAID protection built in, and SATA drives take forever to rebuild, so you'll want RAID 6, plus some hot spares, so figure you'll need to buy some extra drives. Most enterprise storage vendors, however, are going to charge you a premium for the disks, so the price of the disks will jump rather dramatically from a bog-standard SATA drive. Even assuming you go with a low-cost vendor, you're still going to have to power and cool the whole thing, and the vendor is going to want you to pay for ongoing support and maintenance. If you want performance to scale beyond the paltry capabilities of your 500 or so SATA drives, you'll actually need to buy some extra drives to wide-stripe the i/o, and/or some Flash based storage to act as a front end for the hot blocks, further driving up cost.

      Finding the price/performance sweet spot in terms of cost is difficult and made harder by the fact that most storage vendors are, ahem, not entirely forthcoming with the true amount of usable storage and performance that you actually get once you've provisioned it all. It sounds like Pinterest have taken capital expenditures out of the budget almost entirely and moved everything into operational expenses, and their operational expenses are spent on the actual capacity instead of trying to design, provision, and maintain a data center.

      Over time, I would expect that it would be cheaper to build out a data center and storage array, but storage remains a surprisingly tricky and expensive resource, especially as it scales, so I would want to crunch those numbers *very* carefully, with an eye towards deliberately ignoring the TCO calculations foisted off by incumbent server and storage vendors.

      1. Morg

        Re: And how much does 410TB of storage on S3 cost?

        Dude... they don't have that kind of performance or reliability on Amazon, and a box for 410GB is going to use dedupe heavily (either way going homegrown would mean zfs with dedupe and flash cache). In the end it really is just another startup taking the easy risk-free road.

        You have to remember the amazon EC2 cloud is weak and crappy, with 8 core instances lagging behind old 4 core opterons, meh IO, etc.

        Overall their own infra would cost much less than amazon for one refresh cycle, but it would drive them away from their core business so it's a no-no for a startup, just like optimization of anything not POC critical.

    3. ryanp

      Re: And how much does 410TB of storage on S3 cost?

      dont forget that you have a lot more in costs than just the drives. lets just say that you are going to wind up using 250 2TB drives by the time that you are done to have 410 TB, plus RAID and hotspare. Each one of these drives will run about 800 in my experience (of course it could be less in this kind of quantity, but humor me). So that is 200,000 right there. That doesn't include the trays and SAN headers. So that is probably about 21 trays for disks, at around 5,000 a piece, so a total of about another 100,000. For the SAN headers to handle this, I imagine that you are talking another 100-200K, lets go midway to 150,000. So far we are at 450,000. We will also be using at least 2 cabinets, very possibly 3 to store it. Lets go with 3 so that is a small amount of about 10,000. Then you have your UPSs, not sure on the total, but I am sure it is another 10,000 easy. dont forget the SAN switches and HBAs that you will need, so that will be another 50,000. Now you need support contract costs, space to lease for storing the racks, someone to be able to manage it, installation costs. Then you have to option if you want to have it replicated, and then you can double everything. dont forget that you need a system admin to manage this now, which will cost 100K annually.

      Anyways, I see it being at least around 1/2 a million up front costs for this and quite easily much more, and that is not counting any of the recurring costs. I don't know what kind of performance Amazons storage offers, but if is higher performance I can see the numbers rapidly climbing. Anyways Enterprise storage and consumer storage are two different things so the cost is not as outrageous as it seems.

  8. ZenCoder

    Women are drawn to Pinterest the way men are to pornography.

    I was told that Pinterest appeal to women can only be likened to a male's interest in pornography.

    The flexibility these cloud services offer is amazing and is perfect for start-ups. You only pay for what you use as you are using it. If the site is successful you can instantly scale to meet the demand and then choose to invest in a more cost effective, but less flexible data center.

    What I would really like to see on those charts is Pinterest's advertising revenues. I have no clue if they are hemorrhaging money or raking it in hand over fist.

    I'm also curious if it is at all cost effective to maintain some on demand capacity in the cloud or whether its better to have a dedicated data center which is operating at a fraction of its potential most of the time.

    1. Morg

      Re: Women are drawn to Pinterest the way men are to pornography.

      It cannot be cost effective, amazon's EC cloud and assimilates cost more than 10 times their physical counterparts over a three year cycle.

      The "fraction of potential" issue is inevitable and is part of amazon's pricing - the only way you could go around that would be to have some heavy folding activities or w/e to fill the gaps - good luck finding someone to pay for it.

      Amazon's piece of crap cloud... If you take the cheapest compute platform (compute xlarge) you get worse than single 2500k performance for 17870,4 bucks over three years... if we start talking servers that's about 6 dual socket sbe xeons that will have way more ram (full ecc 8GB slots) and also deliver 6*2*(I don't have a conv factor from e5-2630 to i5-2500k but it should be close to 1.2) = 14.4x faster. and also 16+ x the ram.

      And you get to tweak your IOPS as you feel, etc. and pay for a rack and bandwidth of course, but when it's 15x cheaper, why not ?

  9. Ben Norris
    Boffin

    cloud is vastly more expensive

    So what this article seems to gloss over in all it's talk about autoscaling throughout the day is that it would be vastly cheaper to setup your own racks of servers and have them sitting doing nothing a lot of the time.

    The cloud is only cost effective for rapid change either in growth or temporary services. You should always aim to transition your base load away to old fashioned racks of servers if you want to be cost effective.

    The model is exactly the same as hiring or buying a car.

  10. Don Jefe
    Meh

    Build vs Buy

    Reading these comments made me chuckle because it's obvious many of the commenters weren't around during the 90's. Sure the MBA can break down the cost per drive, racks, etc... but they never, ever count the time when your admin has a sick child or when a drunk crashed through a power pole and into your generator or the halon system misfired. Those things happen and they cost big money.

    Tech infrastructure has all been done before and regardless of what the newbies think, there's no financially viable way to argue buying and managing a bunch of kit if someone else (Amazon) already has it. Even at $40k monthly it is still a great deal.

This topic is closed for new posts.

Other stories you might like