back to article Uber: Why we use MySQL

Uber infrastructure engineer Evan Klitzke has blogged this week about why the ride app maker switched from Postgres to MySQL. Typically, Postgres is seen as the hip and trendy RDBMS and an obvious choice over MySQL, but Klitzke says Postgres hit several brick walls: Postgres served us well in the early days of Uber, but we ran …

  1. Gordan

    That article seems to conveniently omit pointing out that InnoDB also uses a WAL (InnoDB log) with similar effect on write amplification, and that MySQL's replication relies on a separate, additional log (as opposed to sending the WAL directly). This goes a long way toward levelling the field, and the omission of even a brief discussion of it makes the article come across as a bit shilly.

    Initializing slaves also requires a state transfer from the master in some way or another regardless of the database used - and the most efficient way is via a snapshot state transfer. Depending on the underlying file system used, this can be very efficient (e.g. ZFS snapshots take milliseconds to create and can then be sent asynchronously). And since I mentioned ZFS, it can also be used to address the inefficiency of double-caching that PostgreSQL suffers (where the same data is kept in the shared buffers within PostgreSQL and the OS page cache) by cranking up the shared buffers to a similar amount as is recommended with MySQL, and setting the FS to only cache the file metadata (primarycache=metadata).

    MySQL has also had releases that were buggy and caused on-disk data corruption.

    While the initial explanation of direct index pointers vs. indirect pointers (to PK rather than on-disk location) is good and establishes some credibility, it is worth pointing out that direct pointers mean one index dive before the data can be fetched, while indirect pointers require two sequential index dives for the same operation. If all the data is cached in memory (shared buffers / buffer pool) that potentially makes the PostgreSQL's direct pointers twice as fast to retrieve the data. This is also applicable on UPDATEs/DELETEs, and will offset the extra cost of rewriting the node pointing at the affected row in each index (vs. only in the indexes affected by the data change).

    Finally, this sentence brings the credibility of the author into question: "This may cause data to be missing or invalid, but it won’t cause a database outage." If the data is corrupted, that is a pretty dire situation, and while they don't mention experiencing bugs like this with MySQL, I have, and it's not pretty. it is in fact a bug like this that made them decide to migrate away from PostgreSQL.

    PostgreSQL and MySQL both have advantages in different uses, but shilling one over the other without laying out the complete truth isn't helpful, it just makes it sound like an attempt at retroactively justifying the cost and effort of a migration. I'm not saying it wasn't justified, merely that omitting critical parts and a quantifiable comparison undermines credibility.

    1. streaky

      MySQL's replication isn't the only available replication in the MySQL ecosystem though. So, erm, oops.

      1. Gordan

        With MySQL there is the built in replication, Galera (you'd better know what you are doing and really mean it) and Tungsten (just don't go there).

        With PostgreSQL there are too many to list off the top of my head, all with slightly different advantages and disadvantages, with some being very similar in bandwidth requirements and/or performance of MySQL's native replication.

  2. Deltics

    Was I the only one that was confused by the "Shemales layer" ?

    (A more careful re-reading of the parenthetical cleared up my confusion. Eventually)

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon