Reply to post: @StewartWhite

Chap asks Facebook for data on his web activity, Facebook says no, now watchdog's on the case

Anonymous Coward
Anonymous Coward


The argument that they have so much data and its not indexed properly in Hive is in fact false.

FB does in fact have the ability to index and access the data quickly. However, they don't want to do it.

Yes, I am posting anon because I am both familiar with their environment, as well as a 'Big Data' expert. They could easily afford the cost of adding indexing as well as converting the data that they store in Hive. Actually the truth is that Hive is the SQL-lite language which is used to query data that is stored in files on HDFS which could be raw log files but are really parsed files stored in parquet. They could use HBase (which they have) to be secondary indexes, and then join them against the primary or base table. (The underlying storage mechanism is abstracted so you can have one table in Parquet, another in Hive's native ^A, ^B format, or comma delimited or even HBase. )

The whole section of the article on the 'Hive mind' is pure spin by FB and it falls flat. While its true that they don't have the capability to do these queries in a timely fashion, its more due to a lack of CPU than a lack of technology or money. They could and probably have already expanded in to a compute / storage model and using Kubernetes can spin up compute clusters that can run their 'hive' or SparkSQL queries against the data.

So I call BS on FB.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon


Biting the hand that feeds IT © 1998–2019