back to article Yes sir, no sir, 3 bags NoSQL sir: It's a whizz-bang benchmark ... but WTF does it signify?

Sometimes fast just isn’t fast enough and in the fast moving world of NoSQL databases, what was considered blindingly fast yesterday can be seen as slow today. For instance, Cassandra has always been thought of as a fast solution for ingesting data into a database cluster, but today upcoming systems such as Aerospike and Scylla …

  1. Charlie Clark Silver badge

    Relational databases just can’t keep up with scalable NoSQL systems…

    Fast at doing what exactly? If you disable constraint checking I think you'll find that most relational systems are very fast. But if you disable relational integrity you're asking for trouble.

    NoSQL is for niches such as logging and other areas of high volatility where you can afford to lose data. Map reduce is ridiculously inefficient for repetitive queries.

    1. PatientOne

      Re: Relational databases just can’t keep up with scalable NoSQL systems…

      NoSQL is blindingly fast compared to MySQL, both for writing and reading. Apparently this was the grounds for claiming NoSQL was a SQL killer. Not that MySQL is that much of a benchmark.

      One thing I might have missed, though: Has NoSQL cracked transactional writes yet? Used to be it couldn't (and some were saying it never would) cope with transactions. Plus, if it's scalable, how well does it handle millions of records? As in how quickly can I find all records relating to X out of 12 millions records stored?

  2. Korev Silver badge
    Gimp

    Sharding to /dev/null is quicker...

    In case you haven't seen it

    1. Bronek Kozicki

      Re: Sharding to /dev/null is quicker...

      There is good reason why "NoSQL" is called the way it is. It is meant for non-structured data. If there are no relationships between tables (i.e. your database is at most 1st normal form) then it makes no sense to pay performance penalty in the database to maintain these, non-existing, relationships. Of course you have good reason to wonder "why anyone would want this" but as it turns out, sometimes it is useful.

      For example, when chucking large amounts of data into a datastore, when it needs to be ingested with maximum efficiency, but only some of it will be processed later (and rest simply thrown away or archived and forgotten). Clearly there are no literal transactions taking place here.

      Just as SQL is not a silver bullet for all data storage needs, neither is NoSQL. However SQL had its period of juvenile growth, decades ago, while NoSQL is only entering it now - hence, you hear about it more.

      1. Charlie Clark Silver badge

        Re: Sharding to /dev/null is quicker...

        It is meant for non-structured data.

        Technically, non-structured data is an oxymoron as structure is what gives the data meaning…

  3. kozmo

    Always check for yourself, however

    Disclosure: I'm a Scylla co-founder.

    While it's always good to test things yourself, especially with different

    models you can find in relational and non relational database, we try our

    best to conduct as fair as possible tests. If there is a case where Scylla isn't better

    in 100%'s over C*, we define it as a bug (we still iron few and get it to zero soon).

    Always better to check users: http://www.scylladb.com/tech-talk/save-latency-money-scylla/

    There are plenty other good ones.

    As for us, we try to highlight an array of very detailed tests, with small to large machines. Sometimes

    we get better results than 10x, sometimes lower:

    http://www.scylladb.com/product/benchmarks/

  4. Keshav

    Improving NoSQL Benchmark.

    [Full Disclosure: I work for Couchbase].

    Yes. YCSB was created a long time ago and done well for what it was designed for.

    There are lot of common NoSQL use cases (especially on document databases with query support like MongoDB, Couchbase, etc) not tested by standard YCSB workloads.

    Hence, I've proposed extending to test with a newer data model based on JSON and adding additional operations (queries) and workloads. It's pending review.

    Here are the details: https://github.com/brianfrankcooper/YCSB/issues/1050

  5. Anonymous Coward
    Anonymous Coward

    Benchmarking NoSQL databases with TPCx-IoT

    TPC has a new benchmark TPCx-IoT

    http://www.datacenterdynamics.com/content-tracks/core-edge/benchmarking-your-iot-gateway-systems/99024.fullarticle

    "The TPCx-IoT benchmark workload has broad applicability – it can be used to assess systems that require high data inject and concurrent queries, evaluating NoSQL databases, evaluating optimal platform for hosting NoSQL databases and more in a vendor-neutral manner."

    While this can be used for IoT systems, this is still an industry standard benchmark which would be audited for results published for accuracy and also the system config. However, the main use of this benchmark is for

    a) Identifying Ingestion rate when system is under load (running queries in parallel)

    b) Data fully persisted

    c) Sustained rate over 30 minutes

    Karthik Kulkarni

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like