Uber: Why we use MySQL

Wednesday 27th July 2016 18:38 GMT Gordan

That article seems to conveniently omit pointing out that InnoDB also uses a WAL (InnoDB log) with similar effect on write amplification, and that MySQL's replication relies on a separate, additional log (as opposed to sending the WAL directly). This goes a long way toward levelling the field, and the omission of even a brief discussion of it makes the article come across as a bit shilly.

Initializing slaves also requires a state transfer from the master in some way or another regardless of the database used - and the most efficient way is via a snapshot state transfer. Depending on the underlying file system used, this can be very efficient (e.g. ZFS snapshots take milliseconds to create and can then be sent asynchronously). And since I mentioned ZFS, it can also be used to address the inefficiency of double-caching that PostgreSQL suffers (where the same data is kept in the shared buffers within PostgreSQL and the OS page cache) by cranking up the shared buffers to a similar amount as is recommended with MySQL, and setting the FS to only cache the file metadata (primarycache=metadata).

MySQL has also had releases that were buggy and caused on-disk data corruption.

While the initial explanation of direct index pointers vs. indirect pointers (to PK rather than on-disk location) is good and establishes some credibility, it is worth pointing out that direct pointers mean one index dive before the data can be fetched, while indirect pointers require two sequential index dives for the same operation. If all the data is cached in memory (shared buffers / buffer pool) that potentially makes the PostgreSQL's direct pointers twice as fast to retrieve the data. This is also applicable on UPDATEs/DELETEs, and will offset the extra cost of rewriting the node pointing at the affected row in each index (vs. only in the indexes affected by the data change).

Finally, this sentence brings the credibility of the author into question: "This may cause data to be missing or invalid, but it won’t cause a database outage." If the data is corrupted, that is a pretty dire situation, and while they don't mention experiencing bugs like this with MySQL, I have, and it's not pretty. it is in fact a bug like this that made them decide to migrate away from PostgreSQL.

PostgreSQL and MySQL both have advantages in different uses, but shilling one over the other without laying out the complete truth isn't helpful, it just makes it sound like an attempt at retroactively justifying the cost and effort of a migration. I'm not saying it wasn't justified, merely that omitting critical parts and a quantifiable comparison undermines credibility.

3 0 Reply