Having read my last Big Data piece, I fear that some of you will try to blag your way out of the declining Oracle/Java/VB market without the legs to support what’s on your CV. This article is not for you: it’s for the poor souls who have to catch you out whilst trying to get in someone who’s at least mildly competent. There are …
Can someone tell me what big data is?
I'm guessing its when you got a Oracle or SQL database(s) of a sizeable size, requiring a room full of servers maybe. or taking x amount quereis per minute?
Where's the line?
Re: whats big?
Anyone who gives an answer to that question would clearly be a faker.
And I'd also kick out anyone who mentioned the 3 Vs.
Re: whats big?
Good question. By and large it seems to be a bit of the emperor's new clothes. Or in many cases I kind of excuse for not bothering to think about what you want and organise yourselves.
We started looking at this but in the end we found that our existing data warehouse could easily pull in the data we needed, cost a lot less and gave more meaningful answers.
I suppose Big Data is a bit like searching through a disorganised attic hoping against hope that you'll find gold. The reality is that the data can be ordered without too much work and that most attics haven't got hidden gold!
Re: whats big?
Don't try going for a job in big data coz you have no idea what it is.
As always, Google can enlighten you.
Re: whats big?
And on that, I'll agree with AC 14:28 - Emperor's New Clothes.
Big Data, (adj):
1) A cute, supposedly cut-and-dry New Word for old problems with, supposedly, new solutions.
2) An appellation, applied by technologists in order to appear as expertise in an amorphous field where growth has been continual yet specific job descriptions applying to said field has been lagging.
3) A premise for a web article, which fails to specify exact skill sets required to tackle the issue but instead mention nonspecific and generalized industry catch phrases to appeal to one's sense of self worth.
Re: whats big?
Usually when people say Big Data they are talking about alternatives to traditional RDBMS designed to handle the problems of massive scaling. Even if their particular implementation fits on a keychain drive.
Re: whats big? :: Domnic the writer of the piece respnds
Size doesn't matter...
Except when it does.
A lot of what people call working on BD is actually bringing together multiple data sources to get some insight into what's going on.
Or it can be low density data, ie bulkyrap like hit patterns on your website or till receipts where you have to scan a lot ot records to produce a small, but hopefully useful result.
Or it can be the point wat which one server no matter how powerful just isn't up to the job and that to get the throughput you need, it has to be spread over many boxes.
Maybe you could call it Big Data when simply backing up all this stuff is a major project of itself.
Or all of the above in some macabre combinations.
One would assume...
That this model would work, if these three conditions return true...
your contractor has no bias / hangups for whom gets the position and judges purely upon merits AND
big data should be considered a philosophical 'sales term' since... if it is distributed it is, somewhat, modular which forms part of a whole 'architecture' AND
the company is willing to pump in the resources, for what people require to achieve this aim with (this includes trusting your team to do their job properly)...
Some developers, I think are better at coding than others, results will vary depending upon coding style, but you do get incompatibilities between styles, so knowing what resources you have already and what is missing , might perhaps, be best left to a philosopher or even the youth of today (since youth by nature are best at pointing out anomalies)... the downside with that, can be, that it comes with some ego?
> the declining Oracle/Java/VB market
The fact that VB is even still being listed is proof enough that these markets just don't "decline" at all, or at least not very quickly.
The declining Oracle/VB/Java markets
It can be slow, for about 20 years Oracle was a decently paid safe option and since a firm changing database is on the same scale as an organ transplant, its decline will be slow.
However the jobs market is more sensitive than the level of the installed base for two main reasons.
Firstly, most ITpros are paid to change things, as things mature they require less changes, so there are fewer jobs. A very large % of everyone who will ever use Oracle, now has Oracle,, many system mostly do what they need to do and that % will increase.
So one big (possibly the biggest) predictor of jobs is not the number of users but the rate of change of that number.
The second big factor is bog standard supply and demand, Oracle and Java have been going a long time, I wrote my first article about how Java was a good thing to get on your CV in (about) 1993 and a vast number of people have done just that. The problem is that the important issue for your career isn't so much how many people do Java (or Oracle or VB) for a living, but how many people are competing for the job you want.
You may be smarter than the average person, but please don't tell me (a headhunter who's done decades working as a developer) that recruitment processes will always spot that and anyway most people are nearer the average.
20 years from now there will be many people doing Java on top of an Oracle DB, but they will be fewer and paid less than today.
The question for you is when to jump off the train ?
On a completely unrelated note...
Shurley a picture of Lore would've been more appropriate for a Data faker?
Re: On a completely unrelated note...
Took me a second to reference that one. Classic!
Re: On a completely unrelated note...
Oh VERY good !
The bigger problem..
Even thought I use an alias, I still have to post this anon...
The article is an utter fail, written by someone who himself doesn't know what big data is or is not.
How do I know?
Because I've been working in the big data space for a while.
I also know because I interview people on occasion and within a period of 15 min, I know whether they are legit or are posers.
The key is that you can look at a person's resume and ask them questions off the resume and based on their responses you will know if they are telling you the truth or not.
There are websites out there where they track interview questions and give answers to those questions. Fortunately for guys like the author, most of the answers are wrong. Just like many of the answers on sites that have questions and answers to help cheat certain vendors certifications exams...
Because there is an extreme lack of deep skills to meet the demand, if you want to break in to big data, be honest. Know your java, and your core skills.
If you are a strong developer, you can pick up Hadoop and the other relevant skill in short order. At least well enough to be dangerous.
Bottom line, be honest, don't lie and actually try to learn the stuff on your own. Buy Tom white's book and rather than spend your money at the pub down the street, set up an AWS account.
Re: The bigger problem..
I can see why you post with double anonymity, Java as the critical skill ?
What colour is the sky on your planet ?
Is the weather nicer there ?
Re: The bigger problem..
I didn't create Hadoop, so I didn't decide on which languages to use to create the ecosystem.
The fact is that while you may not like Java, its the language which is going to be the most efficient in terms of running jobs on Hadoop. However, if you don't like Java, you could look at other alternatives. Scalding? Oh and you can run streaming jobs in other languages, but as I said, you're going to pay a penalty in terms of performance.
Of course you don't have to write your own Map/Reduce jobs. You could use Hive and Pig which then generate the M/R job for you.
The point is that if you want to be successful in this space, knowing Java is going to be a major advantage.
To my original post, the fact that you don't know the relevance and importance of Java to the ecosystems further demonstrates your inability to spot a phony.
Of course, there's R, and other languages, but you're the one who focused on the Oracle-ites who probably don't have the advanced analytic languages like R.
Been there, done that, burned the t-shirt.
My problem with your post is that I do not consider either Java nor other computer related skills the core competency with respect to the problem domain of Big Data. Long before I started professionally in a dozen engineering disciplines including every one of those that are IT related, I already had a solid background in mathematics, logics (not quite the same thing), computer science as well as statistics and probability theory. I had been teaching at the university as well as professionally consulting. At that time, mainframes were it for computing and I was frankly bored by the whole thing. After a solid stint in engineering, now boring, I gravitated into econo-/sociometrics, statistical, numerical, and scientific computing, as well as experimental design. Across every department they had. So I had returned to the fold, or more properly, the fold had become interesting again.
I'd have to spend a depressingly long time to come up with a list of all the computer languages I've used over my life and I'm only 52. Languages mean less than nothing once you have one decent one well understood. Being able to identify which may be suitable to the problems at hand, useful. Understanding architectures is actually more important in my not so humble opinion. A thorough grounding in algorithms and data-structures right up there. Statistical techniques, especially those in related fields (financials, econometrics, modern physics, &c. ad nauseum) equally important. Basically, while you'll want subject matter experts around that have perfect depth in, you'll also need people that interface between, the three groups.
Big Data is an awfully relative term. A rather useful book I read back in the '80's was "Large Problems, Small machines." The techniques described in it are just as apt today as they were back then, and they really haven't changed. Somehow saying billions rather than tens of thousands which was huge back then, seems to cause a sort of deer-in-the-headlights mental lock. None of the predictive analytic models I've developed as an adjunct to my actual jobs in very diverse fields (every college of study save the arts), were small. The challenge was to take existing, or technologically within near reach, hardware and address a problem.
I do have one problem with the article. Sometimes actually identifying what is causal and what is simple (probablistically) correlation is either impossible or meaningless. My first mentor, at the tender age of 14, made sure that I well understood that a predictive model can be strongly correlated and be predictively successful, but the actual causal mechanism be poorly explained or completely unexplainable. Yes, you should keep an eye on the predictions but don't lose an awful lot of sleep about it so long as it remains successful.
If I should return to the university again, highly possible since I've been retired for a while now, I'd like to bring more rigor to the medical (yes, medical) and social sciences. There's an awful lot of theory out there that doesn't bear strict scrutiny. Theory is fine. As I learned in engineering: "The Real World is the Real World and it loves biting you in the ass."
Re: Been there, done that, burned the t-shirt.
Yes, using stats alone to determine causality is a fool's game, the point of the stats is to point things out that would be interesting to look at further or to estimate the effect of a change.