"The relational database era is passing…"
Yes, because mathematically demonstrable advantages are no longer important!
This should be a red flag to anyone to avoid this company with a barge pole.
MongoDB hopes to rake in as much as $220.8m when it finally goes public – a move expected later today. The NoSQL database company, which started life as 10Gen in 2007, has set its share price at $24, up from previous market estimates of around $21. It is putting 8 million Class A shares on the market, with trading due to …
@Korev it depends an awful lot on what you want: time series, documents, etc. NoSQL stuff should really be thought of as some form of denormalised data store that is particularly suitable for particular use cases or queries, where the flexibility of the relational model is less appreciated and you have very large (> 100 GB) datasets you want to play with. There are specialised DBs for things like time series (logs can get big quickly) but HDFS (Hadoop) with Apache Spark is a reasonable place to start to get a feel for the area.
But it's also worth pointing out how useful something like Binary JSON for Postgresql is as an add-on to an existing project.
Spark still offer free parallelisation and that can be useful if your language of choice doesn't support parallelisation very well (e.g. Python). However, it is indeed most suited when you need to scale out to multiple machines.
Regarding MongoDB, I found it a pain to setup and to maintain. AWS DynamoDB has its own problems, but is a lot easier to setup and maintain and has better security by default. In-house I generally use PostgreSQL or Hadoop HDFS, depending on the problems I am trying to solve. However, more importantly for this IPO, the companies that I have worked for and are using MongoDB, are generally very happy to run the community edition without a support contract. That probably explains the large losses.
I've had a play with Spark and I like it a lot. I do have the problem that most data that we play with happily fits in RAM on a single machine* though!
Slightly confused by this because as long as your data fits into RAM you shouldn't have any problems. Both SparkQL and Google's Big Query have acknowledged, that despite its many problems, there are huge advantages in a standard query language. I guess the main difference between NoSQL and relational is whether you want or need persistent indices. NoSQL kind of implies extremely that your data is transitory so it's hardly worth indexing.