Feeds

back to article We don't want your crap databases, says Twitter: We've made OUR OWN

Twitter is growing up and, like an adult, is beginning to desire consistent guarantees about its data rather than instant availability. At least that's the emphasis placed by the company on its new "Manhattan" data management software, a bedrock storage system that was revealed in a blog post on Wednesday. What's not mentioned …

COMMENTS

This topic is closed for new posts.

Too many acronym repeats

So far, Twitter offers LOCAL_CAS (strong consistency within a single data center) and GLOBAL_CAS (strong consistency across multiple facilities).

First thought during the glance-through: Why are they messing with DRAM timing?

0
0
(Written by Reg staff) Silver badge

Re: Too many acronym repeats

I know what you mean - but I'm quite cheered that we have readers and writers spanning systems hardware engineering to database software development.

C.

0
0

SInce when are secondary indexes novel?

Quite a few DBs already implement secondary indexes. RethinkDB, a NoSQL db I've been using lately, has secondary indexes. So do MongoDB and Riak.

And, AFAIK, MySQL has had secondary indexes for a long time....

0
0

Re: SInce when are secondary indexes novel?

What is a secondary index anyway?

1
0
Coat

Re: SInce when are secondary indexes novel?

I wonder if what they are calling "secondary index" would be more like "create geolocation index of female teenagers who like music but hate the tending boy bands" or whatever odd things their advertisers are trying to find out.

0
1
Bronze badge

Re: SInce when are secondary indexes novel?

Secondary indexes aren't novel as an idea, but implementing them on a distributed, scalable, eventually-consistent high-performance database is not a trivial proposition. DynamoDb's LSI and GSIs need to be used with care. Though that goes for any index really.

I wish I had a penny for every MySQL user who blithely slaps 4 or 5 indexes on a table to 'speed up queries' but never stops to see what the query planner actually does, and just ends up slowing everything down and filling up the server faster, making every mutation slower for no query performance gain.

3
0
Bronze badge

Story misunderstands DynamoDb secondary indexes

For anyone who cares, DynamoDB local secondary indexes and global secondary indexes have nothing to do with single datacentre and multiple datacentre indexes.

A dynamoDb table primary key consists of a hashkey - which determines how the records or 'items' are distributed across the infrastructure's servers, and an optional rangekey which, if present, is associated with an index local effectively to a specific server* identified by the hashkey. (*That's the right way to think of it, even if there are multiple replicas)

A Dynamo local secondary index (LSI) allows you to create another index on a table, but it only allows you to drill down to data which all has the same hashkey, as it runs only at the server level. So if I had a table 'peopleMessages' whose hashkey was email address and rangekey was the creation date of the message, I could use the default table index to sort by creation date only. If I add an LSI to attribute 'messageTitle', then I could also search for a message from a specific person by message title - but not for a specific title from any person - well not with the schema above)

A Dynamo global secondary index (GSI) is basically just a wholly new table which you can use to search through data in the associated table no matter what the hashkeys. It's not that different from simply creating your own indexing table, other than Dynamo automatically sorts out the synchronisation of the original table and any other indexes for you, loosens certain constraints (e.g. in a GSI, there can be multiple items which have the same hashkey value), and helps you manage the costs by allowing you to provision the throughput and control which values are 'projected' into the GSI.

In essence, local secondary indexes apply at the server level, only allow you to pick through data that shares the same hashkey, and use of them counts against the provisioned throughput of the table they belong to, whereas global secondary indexes are 'cross server', allowing your to pick through data in a table no matter what the hashkey, and, as behind the scenes they're just tables, the index has to have its own provisioned throughput.

It's all explained in detail for those who haven't been bored to death already here:

Local Secondary Indexes:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/LSI.html

Global Secondary Indexes:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html

3
0
Silver badge
Pint

A Comp Sci 201 project

Yawn. This is really basic stuff.

So, what did they do after lunch on that Monday? Rewrite the firmware inside the drive hardware?

1
7

hashtag

They're so cute when they think they've invented something.

3
1
This topic is closed for new posts.