back to article We don't want your crap databases, says Twitter: We've made OUR OWN

Twitter is growing up and, like an adult, is beginning to desire consistent guarantees about its data rather than instant availability. At least that's the emphasis placed by the company on its new "Manhattan" data management software, a bedrock storage system that was revealed in a blog post on Wednesday. What's not …

COMMENTS

This topic is closed for new posts.
  1. McHack

    Too many acronym repeats

    So far, Twitter offers LOCAL_CAS (strong consistency within a single data center) and GLOBAL_CAS (strong consistency across multiple facilities).

    First thought during the glance-through: Why are they messing with DRAM timing?

    1. diodesign (Written by Reg staff) Silver badge

      Re: Too many acronym repeats

      I know what you mean - but I'm quite cheered that we have readers and writers spanning systems hardware engineering to database software development.

      C.

  2. ckm5

    SInce when are secondary indexes novel?

    Quite a few DBs already implement secondary indexes. RethinkDB, a NoSQL db I've been using lately, has secondary indexes. So do MongoDB and Riak.

    And, AFAIK, MySQL has had secondary indexes for a long time....

    1. lambda_beta

      Re: SInce when are secondary indexes novel?

      What is a secondary index anyway?

      1. -tim
        Coat

        Re: SInce when are secondary indexes novel?

        I wonder if what they are calling "secondary index" would be more like "create geolocation index of female teenagers who like music but hate the tending boy bands" or whatever odd things their advertisers are trying to find out.

    2. smartypants

      Re: SInce when are secondary indexes novel?

      Secondary indexes aren't novel as an idea, but implementing them on a distributed, scalable, eventually-consistent high-performance database is not a trivial proposition. DynamoDb's LSI and GSIs need to be used with care. Though that goes for any index really.

      I wish I had a penny for every MySQL user who blithely slaps 4 or 5 indexes on a table to 'speed up queries' but never stops to see what the query planner actually does, and just ends up slowing everything down and filling up the server faster, making every mutation slower for no query performance gain.

  3. smartypants

    Story misunderstands DynamoDb secondary indexes

    For anyone who cares, DynamoDB local secondary indexes and global secondary indexes have nothing to do with single datacentre and multiple datacentre indexes.

    A dynamoDb table primary key consists of a hashkey - which determines how the records or 'items' are distributed across the infrastructure's servers, and an optional rangekey which, if present, is associated with an index local effectively to a specific server* identified by the hashkey. (*That's the right way to think of it, even if there are multiple replicas)

    A Dynamo local secondary index (LSI) allows you to create another index on a table, but it only allows you to drill down to data which all has the same hashkey, as it runs only at the server level. So if I had a table 'peopleMessages' whose hashkey was email address and rangekey was the creation date of the message, I could use the default table index to sort by creation date only. If I add an LSI to attribute 'messageTitle', then I could also search for a message from a specific person by message title - but not for a specific title from any person - well not with the schema above)

    A Dynamo global secondary index (GSI) is basically just a wholly new table which you can use to search through data in the associated table no matter what the hashkeys. It's not that different from simply creating your own indexing table, other than Dynamo automatically sorts out the synchronisation of the original table and any other indexes for you, loosens certain constraints (e.g. in a GSI, there can be multiple items which have the same hashkey value), and helps you manage the costs by allowing you to provision the throughput and control which values are 'projected' into the GSI.

    In essence, local secondary indexes apply at the server level, only allow you to pick through data that shares the same hashkey, and use of them counts against the provisioned throughput of the table they belong to, whereas global secondary indexes are 'cross server', allowing your to pick through data in a table no matter what the hashkey, and, as behind the scenes they're just tables, the index has to have its own provisioned throughput.

    It's all explained in detail for those who haven't been bored to death already here:

    Local Secondary Indexes:

    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LSI.html

    Global Secondary Indexes:

    http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html

  4. JeffyPoooh Silver badge
    Pint

    A Comp Sci 201 project

    Yawn. This is really basic stuff.

    So, what did they do after lunch on that Monday? Rewrite the firmware inside the drive hardware?

  5. cschneid

    hashtag

    They're so cute when they think they've invented something.

This topic is closed for new posts.

Biting the hand that feeds IT © 1998–2019