Twitter is growing up and, like an adult, is beginning to desire consistent guarantees about its data rather than instant availability. At least that's the emphasis placed by the company on its new "Manhattan" data management software, a bedrock storage system that was revealed in a blog post on Wednesday. What's not …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Friday 4th April 2014 21:54 GMT McHack

Too many acronym repeats

So far, Twitter offers LOCAL_CAS (strong consistency within a single data center) and GLOBAL_CAS (strong consistency across multiple facilities).

First thought during the glance-through: Why are they messing with DRAM timing?

0 0
1. Friday 4th April 2014 22:03 GMT diodesign
  
  Re: Too many acronym repeats
  
  I know what you mean - but I'm quite cheered that we have readers and writers spanning systems hardware engineering to database software development.
  
  C.
  
  0 0
Saturday 5th April 2014 00:58 GMT ckm5

SInce when are secondary indexes novel?

Quite a few DBs already implement secondary indexes. RethinkDB, a NoSQL db I've been using lately, has secondary indexes. So do MongoDB and Riak.

And, AFAIK, MySQL has had secondary indexes for a long time....

0 0
1. Saturday 5th April 2014 02:25 GMT lambda_beta
  
  Re: SInce when are secondary indexes novel?
  
  What is a secondary index anyway?
  
  1 0
  1. Saturday 5th April 2014 05:59 GMT -tim
    
    Re: SInce when are secondary indexes novel?
    
    I wonder if what they are calling "secondary index" would be more like "create geolocation index of female teenagers who like music but hate the tending boy bands" or whatever odd things their advertisers are trying to find out.
    
    0 1
2. Saturday 5th April 2014 09:44 GMT smartypants
  
  Re: SInce when are secondary indexes novel?
  
  Secondary indexes aren't novel as an idea, but implementing them on a distributed, scalable, eventually-consistent high-performance database is not a trivial proposition. DynamoDb's LSI and GSIs need to be used with care. Though that goes for any index really.
  
  I wish I had a penny for every MySQL user who blithely slaps 4 or 5 indexes on a table to 'speed up queries' but never stops to see what the query planner actually does, and just ends up slowing everything down and filling up the server faster, making every mutation slower for no query performance gain.
  
  3 0
Saturday 5th April 2014 09:24 GMT smartypants

Story misunderstands DynamoDb secondary indexes

For anyone who cares, DynamoDB local secondary indexes and global secondary indexes have nothing to do with single datacentre and multiple datacentre indexes.

A dynamoDb table primary key consists of a hashkey - which determines how the records or 'items' are distributed across the infrastructure's servers, and an optional rangekey which, if present, is associated with an index local effectively to a specific server* identified by the hashkey. (*That's the right way to think of it, even if there are multiple replicas)

A Dynamo local secondary index (LSI) allows you to create another index on a table, but it only allows you to drill down to data which all has the same hashkey, as it runs only at the server level. So if I had a table 'peopleMessages' whose hashkey was email address and rangekey was the creation date of the message, I could use the default table index to sort by creation date only. If I add an LSI to attribute 'messageTitle', then I could also search for a message from a specific person by message title - but not for a specific title from any person - well not with the schema above)

A Dynamo global secondary index (GSI) is basically just a wholly new table which you can use to search through data in the associated table no matter what the hashkeys. It's not that different from simply creating your own indexing table, other than Dynamo automatically sorts out the synchronisation of the original table and any other indexes for you, loosens certain constraints (e.g. in a GSI, there can be multiple items which have the same hashkey value), and helps you manage the costs by allowing you to provision the throughput and control which values are 'projected' into the GSI.

In essence, local secondary indexes apply at the server level, only allow you to pick through data that shares the same hashkey, and use of them counts against the provisioned throughput of the table they belong to, whereas global secondary indexes are 'cross server', allowing your to pick through data in a table no matter what the hashkey, and, as behind the scenes they're just tables, the index has to have its own provisioned throughput.

It's all explained in detail for those who haven't been bored to death already here:

Local Secondary Indexes:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LSI.html

Global Secondary Indexes:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html

3 0
Saturday 5th April 2014 12:37 GMT JeffyPoooh

A Comp Sci 201 project

Yawn. This is really basic stuff.

So, what did they do after lunch on that Monday? Rewrite the firmware inside the drive hardware?

1 7
Saturday 5th April 2014 14:06 GMT cschneid

hashtag

They're so cute when they think they've invented something.

3 1