In computing there are many, many different ways to run down other people’s work, not the least of which is: “OK, so they removed the bottleneck, but only by throwing faster hardware at it.” The implication is that tackling an issue just with software is intrinsically better. When did you ever hear anyone say: “OK, so they …

COMMENTS

House rules Send corrections

This topic is closed for new posts.

Wednesday 27th February 2013 15:49 GMT Michael H.F. Wilkinson

There is a reason for Software Smugness

You haven't heard of the reverse boast "only by throwing software at it" because of a very simple fact: If I can get more performance out of the same hardware, by designing an O(N) algorithm to replace an O(N^2) I am being smart. Throwing more hardware at a problem when a better algorithmic solution exists is stupid.

I have seen people use weeks of wall-clock time on a 512 core segment of a big machine, simply because their code was bad. My colleague coded the thing properly in C++ and had the code running on his desktop and finishing in a few minutes (O(2^N) vs O(N log N) if I recall correctly). Only throwing hardware at a problem is often wrong. Thinking about better algorithms is never a bad idea.

Once you have really thought about the algorithmics, then you can start throwing more hardware at it (and once you do that, you must rethink the algorithmics again, especially when doing parallel stuff). So in our massive image processing stuff (Gpixel and Tpixel), we first minimize communication and disk-access overhead, and then move to SSD or Fusion-IO stuff.

8 3
1. Wednesday 27th February 2013 16:18 GMT TeeCee
  
  Re: There is a reason for Software Smugness
  
  You are correct. One other thing to consider is that I have never seen a "big win" from chucking hardware at a problem, but more than a few from tweaking the software.
  
  For a start, TPTB don't perceive anything that came with a monumental price tag on it as a "big win". Also, in the current economic climate, anyone not wringing every last bit of grunt out of the hardware they have is asking for trouble.
  
  0 0
2. Wednesday 27th February 2013 17:17 GMT Robert Carnegie
  
  Re: There is a reason for Software Smugness
  
  It's a question of whether it's cheaper to pay for a smart programmer or pay for some hardware.
  
  It's also usually a mistake to pay for someone to reinvent the wheel.
  
  And a well-placed psychopacth in our organisation has forbidden the use of standard data import methods on our SQL Servers. This leaves us doing things like struggling with obsolete decades-old alternatives that they forgot to ban, or making Excel spreadsheets where a duplicated cell formula constructs a series of INSERT statements. I'm quite produd of how I handled date values but it SHOULDN'T BE HAPPENING.
  
  The problem is we aren't allowed firearms in this office.
  
  2 0
3. Wednesday 27th February 2013 17:32 GMT the spectacularly refined chap
  
  Re: There is a reason for Software Smugness
  
  Thinking about better algorithms is never a bad idea.
  
  Of course it is, what is really dumb is presenting that kind of moronic statement as an axiom when in reality it is merely justification for one's own vanity. Smart people use dumb methods on occasion simply because of an appreciation of real world factors and having the common sense to employ some form of cost-benefit analysis. How many chunks of code are only ever used a small number of times, possibly even once? Think about "code" in the broadest sense of the word - it could be some one-off data manipulation job or even a simple for loop at the shell prompt.
  
  Consider a job that we know in advance is a one-off. You use your "better" algorithms and take two hours to devise a solution that does the task in two seconds. I spend two minutes knocking something up that does the same job in another two minutes. Your solution may be "better" purely in a vanity sense but is that really the best use of resources? Remember your assertion: "Thinking about better algorithms is never a bad idea."
  
  5 3
  1. Wednesday 27th February 2013 18:27 GMT Steve Knox
    
    Re: There is a reason for Software Smugness
    
    Consider a job that we know in advance is a one-off.
    
    There's no such thing. There are a lot of jobs where we believe the given task will not have to be repeated and have no foreknowledge of the applicability of the individual solution to other tasks, but that's a very different situation (it's like the difference between "I know that there is not a god" and "I do not know if there is a god").
    
    Furthermore, your argument implies that time is the key factor, that the thinking about the better algorithm happens linearly with the relevant task, and that no other work can be done while the thinking is taking place. Neither of those are necessarily the case.
    
    Thinking about better algorithms is never a bad idea. However, it's important to carefully plan how much time you devote to said thinking, and, often most important, when you think about those algorithms.
    
    0 0
  2. Wednesday 27th February 2013 19:54 GMT Frumious Bandersnatch
    
    Re: There is a reason for Software Smugness
    
    RE: Remember your assertion: "Thinking about better algorithms is never a bad idea."
    
    While you should probably "never say never", I'm siding more with the original poster. Although there is often a balancing act involved in how much time you can spend on finding a better solution and bearing Knuth's "premature optimisation is the root of all evil" quotation in mind, there's still often a good case to be made for looking for a fairly efficient algorithm right from the start.
    
    I don't think that anyone is saying that we should go to excess in looking for the best solution, but for what we're talking about here (processing big data sets), we should really be aware of how expensive and time consuming each possible solution might be. It's the mindset that's important: do you just write the most basic SQL query, or do you take care to minimise expensive join operations or defer them to be operated over a reduced data set, for example? Also, experienced coders will of course realise that there's no point in blindly trying to optimise every single aspect of the code. They'll use a profiler (or equivalent) to identify where their efforts stand to reap the most benefit.
    
    Speaking of programmer effort, I think that in many cases, it can be false economy to use inefficient algorithms. If your algorithm is bad enough, you can end up spending more time waiting for results when you're coding and testing the thing (on real data, as opposed to just a small amount of test data) than you would if you'd just thought about the problem a bit more from the outset. Granted, you can multitask and do other stuff while you're waiting, but it's not ideal to have too many context switches or your productivity will suffer. Plus, what happens when you finally realise (or have to be told) that the solution isn't good enough? Most often, you have to go back to the drawing board and do what you should have done in the first place and implement a half-way decent algorithm.
    
    0 0
  3. Wednesday 27th February 2013 20:14 GMT Michael H.F. Wilkinson
    
    Re: There is a reason for Software Smugness
    
    @ the spectacularly refined chap
    
    My statement is indeed a bit broad (and to generalize is to be an idiot). I was talking about when there is a bottleneck. If there is no bottleneck, there is no need to throw resources at it, hardware or software. The examples I gave are cases with severe bottlenecks.
    
    In your example of the one-off job you are actually also thinking in terms of algorithmics: which one is simpler to implement (and therefore easier to get right). That is why for one-off jobs or experiments I like to use scripting languages (MatLab most of all for my work). Only when heavy lifting is needed (and we have established firmly what we want to compute) do I go for C(++).
    
    You are of course right that there is always a balance to be had between implementation time and total CPU time used. In the very old days CPU time was costly, software development and maintenance time was comparatively low, because code was comparatively compact. Now CPU time is cheap as chips, but code development and maintenance is not, what with the dramatically increased complexity and interconnectedness.
    
    0 0
Wednesday 27th February 2013 16:05 GMT Justin Stringfellow

"A farmer with a heavy cart can only increase the size of a single horse up to a point before resorting to multiple horses."

Analogy fail. Horses can't be magically grown larger and stronger on the spot.

Actually what happens is the farmer sells his non shape-shifting horse to a French abbatoir and buys a tractor.

0 0
1. Wednesday 27th February 2013 16:14 GMT Flawless101
  
  Would the cart then not need reinforced as the farmer might try to add more to it and the force from tractor pulling it?
  
  0 0
2. Wednesday 27th February 2013 16:15 GMT Michael H.F. Wilkinson
  
  Nowadays, the farmer first glues on some horns and sells it to the abbatoir as a cow.
  
  5 0
Wednesday 27th February 2013 16:34 GMT Tom Wood

What a lot of waffle

that just boils down to:

1. Faster hardware can make stuff faster, to a point.

2. You might need to think about the algorithms you use.

Well, thanks for that wonderful insight.

As someone more or less said above, no point writing a load of code to parallelise a really inefficient algorithm and then chucking lots of hardware at it if you could replace it with a non-parallel but much more efficient algorithm.

4 0
1. Thursday 28th February 2013 08:24 GMT CarrBigDog
  
  Re: Its about the solution
  
  I am one of the dinosaurs that came from the early days of business computing. My first boss wrote a software routine that multiplied two numbers faster than the attached hardware device on the IBM 1401. When we moved to IBM 360, we threw it out (and rewrote the software that depended on it). It all comes down to doing what is necessary to provide the solution that satisfies the requirements (business, technical, etc) at the best cost with the most growth. I have seen examples that moved from hardware dependency to software dependency to network and back and forth over the years as business requirements increase to beyond what the current solution provides. If you are seeking something else then it appears to me that you are doing research not real computing.
  
  1 0
Wednesday 27th February 2013 17:19 GMT Grahame 2

"The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry. "

— Henry Petroski

4 0
1. Wednesday 27th February 2013 23:00 GMT Destroy All Monsters
  
  That's just as retarded as saying that the damn planes always are full of fracking people, even if we biggen them up.
  
  Maybe it was sarcasm though.
  
  0 0
Wednesday 27th February 2013 18:10 GMT Anonymous Coward

Crap examples

Summing a bunch of numbers or finding the top ten are equally parallelisable.

The "local top tens" can be merged into "not quite so local top tens" until you have the final result. (Although you wouldn't actually do this because the final merge step is trivial.)

What is the point of the second example?

0 0
Wednesday 27th February 2013 22:38 GMT BlueGreen

I'm going to take issue with this bit

> So far, so parallel. However, at some point these "local Top 10s" must be brought together as a single task to decide the actual Top 10. This final operation cannot be done in parallel and whether you have 10 nodes or 4,000 you will be using a single node out of your cluster to do this final reduce step.

I've never seen this as a problem in the real world. In theory, sure but you don't need to paralise the final step because it'll run quick enough on one node (even if that final node was a 8086 or somesuch). It's never been a problem in my experience.

Further, once you get past a certain point you can start relying on 'bulk' rather than 'molecular' properties of data. You want the top 10 sales report, ok, but does it need to be utterly correct? If you're dealing with small data in a small company, quite probably - but then there's little data to worry about. If you're megamart and you've got 50,000 products then it'll make little difference if you get a little probablistic and your report reflects 98% of sales correctly (and if you're wrong it's because #9 and #11 got switched because they are so close as to not worry about).

Another example, you mentioned collaborative filtering (thanks for the link, didn't know, will have a deeper read) for recommendations. So you get 5% of recommendations slightly sub-optimal if you cut big corners. So what. You saved a lot of hardware and (maybe) lost a few sales. Net gain.

Theoretically perfect decisions aren't much, if any, better than 'close will do' in the real world, certainly not if you have to invest in racks of hardware and scads of admin/devs/mathematicians to rule them.

Probabilistic is very much FTW when you start getting big. I'm sure the author knows what READ UNCOMMITTED is and its tradeoffs.

0 0
Wednesday 27th February 2013 23:03 GMT Destroy All Monsters

Just a minute

"This may seem like an obvious choice and, yes, we are talking about NoSQL solutions like (but absolutely not exclusively) Hadoop and MapReduce."

Hold on! NoSQL is for people who want to have fast, possibly distributed databases and either don't need a proper relational scheme or can absorb a temporary relaxation of consistency in exchange for decentralization. Hadoop and MapReduce are about distributing computational tasks over a large matrix of machines. Not the same thing at all. Or what??

1 0
Thursday 28th February 2013 00:11 GMT Neoc

Stupid first option.

Throwing faster, more powerful hardware at a problem should not be the first solution out of the box (exceptions exist, but they're just that - exception).

The first thing should be asking the question "why is this so effing slow?" and looking to see if the code is not doing something stupid. Why? Let's just look at the simplest response - where I work, we program on large systems; the cost and time of replacing said hardware for a "faster" one would far outweigh what it would take to make changes to the code (most times).

Yes, there are time when you just simply need to upgrade but based on my 25+ years experience the majority of bottlenecks are caused because bad coding or assumptions happened and the code that was written is just simply not up to the task for one reason or another.

This sort of thinking is the reason one of my friends' son came home angry a few years ago, having been docked a couple of points by his university lecturer for optimising his code. The reason the Lecturer gave: "there's no need to do it, the hardware's fast enough". And that's the lesson: be lazy when you program, the hardware will take up the slack.

1 0
Saturday 9th March 2013 13:47 GMT Thesheep

The depressing state of coding...

...is that all you expert coders have never managed a piece of code that you couldn't have improved if you tried. At least that's what I'm taking from your comments. Unless you mean it's just other people who are writing useless code?

So if you assume the code is efficient, and the problem still can't run in a reasonable time, what other option do you have than 'throwing' hardware at it?

I remember doing data mining on a 386, and waiting hours. Along came a mighty 486 and a whole range of things that were un achievable could now be done...

0 0