But good code...
...is an artform. In fact I'm trying to get the local Revenue Office over in Ireland to recognise me as an artist so I'll qualify for the Artist Tax Exemption scheme...
There’s a kind of cognitive dissonance in most people who’ve moved from the academic study of computer science to a job as a real-world software developer. The conflict lies in the fact that, whereas nearly every sample program in every textbook is a perfect and well-thought-out specimen, virtually no software out in the wild is …
...is an artform. In fact I'm trying to get the local Revenue Office over in Ireland to recognise me as an artist so I'll qualify for the Artist Tax Exemption scheme...
My first role was coding C and C++ on a legacy mainframe and I remember the joy of looking at the flow and symmetry of a carefully crafted and well commented code module.
Other people's of course. Mine were more like a toddler's scribble than a renaissance masterpiece!
I've maintained both types (and pretty much everything in between) and I find in some cases the elegant, flowing almost Zen code is harder to understand and debug. I've also noted that my coding style tends to match the original. Odd, that.
Beer because sometimes, it helped.
In the field of Electrical Engineering, the best textbook ever written is the 1152 page "The Art of Electronics" by Horowotz and Hill. And that's despite the last version being 23 years old. The reason it's the best, even for the sections dealing with digital circuits, is that every single example in the book is a real world example rathet than an idealized example, and that if you build any of the examples, they work (or don't, as the book also has plenty of what they call "bad circuit designs").
Despite it's age, the text book is still in print I still use this book when I teach EE circuits course.
Maybe someone should write a similar book for programming.
Right of the top of my head:"Advanced Programming in the UNIX envirionment" by Stevens.
Being about Unix its obvious in C. Its carefully crafted(!) reallife code which takes into account
an awfully lot of reallife specialcases.
But shows the problem. Its just one language in `one' environment.
The number of languages * number of envirionments tends to explode.
More general books on programming (most) can only deal with general topics.
My first project after college was an MS BASIC based finance data collection tool, running on CP/M and MS-DOS. The problem was, it was written by Fortran programmers and had been maintained by COBOL programmers... None of them had heard of For...Next or While...Wend loops, it was all add 1 to count and IF count THEN GO TO statements!
There were hundreds of lines of old code that had been commented out, but removing them stopped the program working! There were computed GO TOs littered in the code, which jumped into the middle of sections of commented out code!
Still, I managed to optimise it in the end. The collection and transmission routine used to take around 4 hours to run. After a couple of weeks of fettling with the code, I got it down to under 20 minutes!
"The technology people employed at these companies are considered to be the very best, if only because the pay tends to be so good."
If you believe this, then you will never see good code and good coders. Those who graduated into banking roles were mediocre programmers with mediocre ideas who enjoyed the idea of making a far from mediocre pay check. I don't imagine you will see much focus on code quality in a startup either.
Find a grown up engineering company which lives and dies on their reputation for quality software engineering, with managers who have been around long enough to understand the need for investment in a code base throughout its life cycle, and you will see a different picture.
Eastgate Systems: maintaining Storyspace which is something like 25 years old, and developing Tinderbox which has been around since Mac system 7? The main man is still refactoring and rewriting his code base.
Right, it's like saying an airline pilot is a better pilot than a military one because he gets an higher pay. In these areas the best people are attracted more by challenges than money. Mnay programmers know that working in such companies means very little freedom and almost no tech challenges (unless you work on some high-end trading systems).
Of course that doesn't mean only banks write bad code - but they are not the best programming environment either.
There's a large dose of "temperment" here. Quite a lot of programmers will avoid jobs in a squillion dollar operation that works on a "must work perfectly yesterday or else we all die". That's not because they aren't good enough. It's because they'd rather do something else. Part of the pay at these operations is for talent, but the majority is in compensation for stress.
"The technology people employed at these companies are considered to be the very best, if only because the pay tends to be so good."
And only, too
I agree - Investment Banking, especially the front-office sections - is generally not a hot-bed of talent for computer programmers.
Having worked at some of the same companies (even the same teams) as the OP, other considerations generally get in the way of writing 100% correct/stylish code.
Investment Banks are not software houses. A lot of the code will be written by people with no CS or CEng background so it is not fair to expect them to write beautiful or perfect code. Time considerations generally man that you only get 1 shot at writing most of the code, and you very rarely get the time(aka money) to refactor the code at a later stage.
However, the OP seems to miss the point about teaching programming. You don't teach people how to program badly, they just have to learn that for themselves.
It's not always 'the programmer's' fault. Maybe not even most of the time. I think management have to shoulder a fair chunk of blame. When was the last time anyone got approval for a refactoring project? And don't tell me we're the only ones that get defects deferred '..to be fixed in a future project'.
The best code I've written has been generic and flexible. One project (a suite of data recovery tools) started out in 1992 as an MSDOS suite, was ported to Win16 a few years later then Win32. It had some ugly code occasionally but there was little to no management so we could decide to spend a month refactoring and did. That code lasted for 15 years and would still be good to day except that new owners didn't want it. I went from a C++ novice to C++ guru and as my skills evolved I was able to make time to evolve the code to take advantage. No chance in hell of that happening in most places (sadly not much where I am now).
The reason a lot of code is sloppy and fugly is because the developer knows that their manager is paying them to get it out the door ASAP and is willing to defer fixes until the next project. I don't get paid for writing good code. I get paid for writing an application that sells.
"I don't get paid for writing good code. I get paid for writing an application that sells."
I'm in my third decade as a developer, and this has been the case everywhere I have worked.
I absolutely recognise this. I took an incomplete project and had to get it to the finish line. The management requirements changed so much that the original design was well off the mark as was the implementation. Add the aweful code which ate memory and took a while to process a simple database fetch and it had disaster after disaster with serious bugs that couldnt be tracked down in code that shouldnt even exist. Put simply it wasnt fit for purpose in any respect.
I maintained this system for a few years as management demanded new features even if it added instability. The worst part was how proud of their system they were. It kinda did what they wanted with lots of quick fixes or direct intervention in the database and there was a collection of tools I wrote separately to fix problems when they occurred.
I did all I could for years to explain how eventually the limits of the system would be reached and one day there will be no hope of extending the system or even maintaining it. I asked to redevelop sections of the system to at least bring stability to some of the users in house but the only time this was truely considered was when new IT staff of dubious capability were brought in to manage such a development. I made a very quick exit as I was being asked very simple questions about how a database works and the standard programming languages we used by the new IT staff placed in charge (brought in for their skills in these areas!).
I actually find myself in your position. These days im doing a mixture of technologies for a small business, I have always counted myself lucky and your statement has reaffimed it.
When I started my code wasn't great but my ethic was. My techniques have improved a lot of the period of employment and crucially I have been allowed the flexibility to bring old code I have developed up to my current standard on many occasions.
I think it allows for a great improvement to personal skills and comprehension, when looking over and improving old code.
and it was ever so....
One thing that works for me is to allocate some time for "while you're in there" refactoring when estimating development times. Then there is a bit of buffer when changes need to be made to some particularly manky old piece of code - by the time one has understood it it's not that much extra work to refactor it.
The biggest problems are that (a) I'm an optimist and (b) managers always try to compress the development schedule, so there isn't much in the way of buffer.
It's very tempting to do "while you're in there" refactoring, and it's very often a bad idea. Any change to working software introduces risk., and this risk is magnified by well-meaning attempts to clean up bad and incomprehensible code. This is also a good reason why quick-and-dirty patches aren't revisited.
I'm not sure I understand this. Refactoring _saves_ time. When you're chasing a bug, or have to add functionality and find yourself searching for where on Earth to apply the change, you can spend half an hour thinking about that (and probably getting it wrong, triggering an extensive debugging session) or you can spend that half hour whacking that piece of the code back in shape and then spend 30 seconds applying the change (and probably getting it right on the first try).
You have to practice, a lot. You have to have good tests (and run them!) so you get instant feedback on each step. But then you will be faster, not slower, with refactoring.
Now, if you're brought in on a project that's already three miles down the hole, then half an hour isn't going to cut it, obviously. Although I think that if the powers-that-schedule are honest and count debugging time, schedule slippage and so on, you'll still find the refactoring bonanza is faster than trying to grin and bear it. But, that is admitting up front that you made a mess and a lot of people/teams/managers find that very hard to do. So, social reasons not to refactor, I get. But valid timing or technical reasons? I very much doubt it.
"it's very often a bad idea. Any change to working software introduces risk., and this risk is magnified by well-meaning attempts to clean up bad and incomprehensible code. This is also a good reason why quick-and-dirty patches aren't revisited."
^^ This. Over and over and over.
As a very simple example, I needed to add some additional checks to a very old bash script. Now this script was written as a quick hacked together job by someone who used to work for the company. He left many years before I came along, and the "code" has been modified by so many people, with such varying backgrounds and styles, and copied to do similar jobs slightly differently... Well, as you can imagine it's more of a mess than a spaghetti factory after an explosion.
Anyway, I thought it would be a good idea to tidy it up "while I'm in there". This was a bad idea. So many obscure utilities were being used, exploiting "undocumented features" in them, that we very nearly lost a weeks worth of data (luckily I had added the additional checks first, which caught the mistake).
As for "time for refactoring"... I have been pushing for this for 5 years now. According to my bosses it's not necessary. When I tried to make time in between jobs to do it, I got a bollocking for wasting time. These scripts fail every few months and take a few hours to clean up after and get going again, delaying other departments in the process, but spending a day rewriting them is a "waste of time". I've given up, and managed to push the "clean up" responsibilities onto someone else (sucker.
And this is just a tiny set of bash scripts.
I only refactor code that is already being changed. If I did get a refactoring project then fine anything is up for grabs but changing code that wasn't otherwise going to change is asking for trouble.
Developers always want to refactor code. It's actually a developer's nature to move code about and fiddle with it - it helps to understand it and know your way around it. However, to a QA guy, code that's been tested has been tested. Any changes to that code erodes the money that's been spent on testing that code, which then must be tested again (so you're costing QA twice). Managers are trying to keep devs happy with new and exciting shiny things and also suits happy with new features. Managers also have to perpetually answer questions like "why are you spending time rewriting the thing we've shipped instead of adding new IP and value to the company?".
You could argue that refactoring code saves the company money by creating a new and "better" bedrock for future development. I've never know that to be the case in reality. Feature requests and road maps change so fast that you either generify your product out of existence or your refactor yourself down a dark alley.
Ultimately, apart from "but the developer likes it" there is no justification from any angle to introduce change for change's sake.
Even as a developer, it's a right bastard finding that someone moved your code around and introduced a cut and paste bug.
Absolutely totally agree with this. If you're dicking about with a piece of code anyway, I as a manager trust your skills as a competent engineer to leave that code better than you found it. If we have to test it again anyway then fine. But start changing code that we've tested and is a known quantity and you're entering a world of pain. It's just wasteful, and wouldn't you rather be off doing new shinies than having a manager scowling at you for introducing risk?
@redpola: Errr...Exactly what is the scenario we're going through here? I honestly don't understand. There's apparently some code that's already QAd, so you don't want me to change it --makes perfect sense--, but you _do_ want me add a feature to it? If that is the situation, why aren't the QA people kicking you?
Or if it not a feature I'm doing, but a bugfix, then by definition that code wasn't yet fully QAd. (If we're well down the QA process I do understand you want minimal intervention and I'll try my hardest to do just that, but I can't fix bugs without touching the code.)
The worst part about QA, is how extensively have they QAd it? I've ran across loads of code that wouldn't have passed a proper desk-check, let alone unit testing. That's the code that needs to be touched.
redpola : Ultimately, apart from "but the developer likes it" there is no justification from any angle to introduce change for change's sake.
One compelling reason to refactor code is to make it testable by introducing seams or decoupling dependencies. People tend to exaggerate in these forums.
"It's not always 'the programmer's' fault."
That was part of the message of the article, even if it was a secondary one.
But I think the problem is fundamentally this:
No matter how much design you do in advance, all programming is essentially "design by prototype" as there isn't a single programmer I know who can order his thoughts and visualise all the possible side-effects of a large codebase before he hits his IDE.
Therefore it follows that the output of any coding process is a prototype. That's what we need to start teaching to managers: "it's a prototype, it demonstrates the principles and design decisions required for making a production model, but it's not a production model." If you need to, say it's a matchstick model of the structure you want to build: it's identical aside from being a bit weak.
I agree with the idea, but reality can be different. There is a time in every developer's life, known as the 'death of sanity,' where you are assigned, typically by yourself, to a project that has scope creep and a screwed up code-base. The problem, of course, is that when you accepted the project, you knew nothing of the problems -> you only understood the depth of the problems several months into it. The code-base itself, being so large, probably would take at least a year to refactor, but the project was due 6 months ago. You end up in the scenario where management over-promised to the client, and thus, spends every day asking you how much longer it will take to complete the project. Deadlines are perpetually missed, and you can't clear your head outside of work, instead spending all of your time staring at the ceiling instead of sleeping. Again, this is the 'death of sanity,' and you can't get far enough away from the problem so as to see it correctly.
I've been there several times, and have slowly acquired the skills necessary to spot it a long way off (well, some of the skills). When someone assigns you a project, do they know what the code-base is like? If not, they get a non-committal response to the project -> I'll look at the code to figure out where it is, but looking at it is not accepting the project. Code-base is a pig? I will see what I can do, with healthy buffers around my timelines. This is the reality of code -> someone wrote something, and though they may be intelligent, they may also have been under orders to write it that way. See the book Catch-22 for all applicable scenarios.
The only side-effects are that to some people, I appear lazy, and to others, I appear intelligent. To the people who want the project to succeed, I appear intelligent, asking all these gotcha questions, which tells them I am slowly crafting a plan of action. To the people who think success is achieved by working the whip as quickly as possible, I appear lazy, asking all these gotcha questions, instead of just 'diving in.' I prefer the more intelligent people, as they tend to be more realistic, but I do not always get to choose.
Just look at why Microsoft dropped Pinball from Vista. It's because the code was undocumented, hard to understand, poorly commented, and basically crap. They couldn't even work out how the collision detection mechanism worked and when it was ported to 64-bit the detection didn't work. Shows that bit of code made too many assumptions about word size for a starter.
Written in 16-bit code, ported to 32-bit code and then (almost) ported to 64-bit code? I'm surprised that it even ran.
Been there, done that (DOS to Win16 to Win32) over 15 years. Suite ran just fine thank you and many people benefited from it. Oh and we did that without a QA department and little to no management. And it was agile before we knew what it was called because we could get called off to other work at any time.
Sure there is plenty of badly written code about in the wild, but let's not forget that the type of code you'll find in academia only has to meet nice clean invented requirements; not real world ones. A lot of the complexity of real-world code comes from meeting real world users' needs and from working round flaws in existing real-world code.
I suspect conflicting requirements in many cases. Trying to balance the demands of managers, clients, and ultimate users' needs is probably not even possible in many cases.
That reminds me of the NHS Integrated Software Disaster, where it seemed noone actually listened to the techies, whether tech users or tech developers.
More generally, sometimes you don't know what the requirements really are until a prototype has been working for some time. Eh! Protoypes? Bl**dy waste of money, any manager will tell you.
There's a strong analogy here with genetics (Read Dawkins' "Climbing Mount Improbable")
By walking upright with a back designed to walk on all-fours; we risk spine degeneration, pain, and disability. But from a genetic perspective it was infeasible to go back to 'no spine' and build one from scratch; just to be able to stand a little bit taller.
The same applies to code, you get to a point where to add a new function, the 'right' solution is to start from scratch; But that might mean re-writing a £multi-million piece of software, JUST to add one new function. So the only feasible option is to hack it on in a sub-optimal way.
The problem with that attitude is that few of the people paying the bills appreciate exactly how much it costs to maintain a codebase that resembles the carcass of a whale that's died from gas gangrene.
I've seen a big software porting effort which took 30 skilled (and hence expensive) people working for six months on something that could have been done by half the people in half the time if the folk doing the porting were familiar with the code base which was frankly cancerous in its disorganised and complex sprawl.
Following that porting effort, those 30 people were reassigned, having not had the time to clean up the awful codebase, and a new group of people were assigned to add new functionality. Again, 30 people, 6 months, most of which was spent fighting with the incredibly brittle codebase which was so full of hacks that it was practically impossible to do any refactoring without breaking everything.
I wouldn't be surprised to learn that a 7 figure sum was ultimately expended on this exercise; a figure that could have been reduced by an order of magnitude if the codebase wasn't such a mass of tumours. But at each stage, project management and funding just looked at what needed to be done to accomplish the next goal in the cheapest way possible. Cost cutting every step of the way, and the end result was a multi-million pound unmaintainable codebase. Good times.
JeffUK, not sure what you mean. I've had a couple of bosses who were perfectly able to walk upright. But they definitely had no spine at all.
Is also dependent on the programmer actually knowing what the operational parameters are...
Plenty of bad Code originates in the fact that all too often you have to deliver code to Do Stuff, even though the underlying assumptions/corporate structure/actual hardware are stull "under negotiation by management". Which usually ends up with an ETA of actual programming parameters about two months after the project is *supposed* to have ended, at which time the local intern is tasked to "fix stuff. Oh, and could you add routines for these bits of management info we desperately need after our comittee meeting yesterday showed....."
Writing any code in "an open managerial environment where ad-hoc flexibility is emphasised" amounts to cruel and unusual punishment. Writing *good* code in such an environment would require time travel ( and preferably a Special Order cattleprod).
> Include some comments!
That deserves a reply of its own. I hate comments. Even if they start out accurate they get out of date because there's nothing to enforce change in them as the code changes. Far better. Far better is to write code that is self documenting. It isn't even difficult. Every time you think of writing a comment, don't! Call a function or method or instantiate an object and put the comment in the name of the identifier.
Identifiers should not be short, succinct and cryptic. Modern editors mean you don't have to type the whole thing out so get creative.
"Even if they start out accurate they get out of date because there's nothing to enforce change in them as the code changes."
Yes there is, it's called a "Code Review". All code should be reviewed by the team, and preferably by some not involved in the actual project. Of course, time needs to be allowed for this and it seldom is.
"Every time you think of writing a comment, don't! Call a function or method or instantiate an object and put the comment in the name of the identifier."
That's good up to a point, and well named object/methods/params/members are a big help; but nothing, nothing beats proper docs and comments. It also means the APIs can be spat out and handed over to other teams fully doc'd, with behaviours, expected responses etc all done. No need to keep secondary documentation in sync. One line of code docs/comments per line of code (ish, depends on the language).
If you change the code, change the comments. Simple.
> If you change the code, change the comments. Simple.
In theory, yes. In practice not so much. Comments are only read by humans so they can be overlooked and neglected. Source code is read by the compiler and although it won't enforce identifier names it can enforce structure especially if encode your logic in object constructors and well defined classes.
"....I hate comments....." Sorry, but I probably wouldn't hire you. Managing programmers is worse than herding cats, and I tend to lay it on thick when it comes to the "this is how you will write your code" introductory talk, and I have a whole chapter in my standards doc on EXACTLY how often you would be required to add comments, markers and other structure, and when they will be reviewed and updated. Just to be a real pain in the prosterior, I have it written into the acceptance criteria for contractors that work for us too. I have seen people try and auto-comment and it usually doesn't work. You could be the Albert Einstein of coders but you're fudge-all use to me if I can't take your code and give it to what are definitely non-Einstein support monkeys.
On of the biggest problems with many business projects is so-called "agility" - everyone in business thinks they want it, they all say we need it, but they don't realise it often comes at the cost of stability. The business analysts will spot some new market segment, the business architects will come up with the business re-allignment to take advantage of the opportunity, then we have to sit down and work out the systems to make the new business processes work. All too often the business analystes will be saying we need to be up-and-running in two months to beat the oppostion, but the coders are having to say they need six months to write the code properly. The result is a desperate rush to get code out the door with the minimum agreed functionality and the vague intention to add features and tidy up later, a recipe for ongoing firefighting. By the time you release the first phase the analysts want to add or change something, the goalposts move, and you're going to spend the next few years chasing those ever-moving targets.
And function bodies change too! I've seen countless functions where the name is misleadingly (and dangerously) inaccurate because the body has been changed. While comments won't prevent this, they will help explain that the function doesn't do what its name suggests.
Anyway, good comments include the "why", not just the "what".
It's so long ago that I can't give credit where it is due, but the best approach to commenting that I've found is:
* If you need to comment a line of code, that code is obscure - do it differently. (not always possible, as it may be a toolset restriction) - but line-by-line comments should be exceptional and are there to say "here be dragons"
* every module and routine should have a comment header that explains its purposes, inputs outputs and side-effects, and an easy-to-read outline of the steps that occur in processing
* write the header comments before writing any code
The rationale is that line-by-line comments generally make code harder to read because one can't see the totality of a routine on-screen. On the other hand the header comments are an expression of the programmer's understanding of the requirements and allow one to clarify how the problem is going to be solved before getting into the nuts-and-bolts of language and toolset specifics.
BTW the old rule on function size probably still holds - if it is less than 20 lines, it's probably too small, more than 60 definitely too big (yes I know OO results in lots of dinky methods... but too much of that results in lousy readability as one has to skip from one routine to another to follow the logic).
> there to say "here be dragons"
You have a point there, the per-line comments should be to tell you what is not obvious from the code, usually it means something done for efficiency or a possible future "gotcha" if anyone changes it carelessly.
> every module and routine should have a comment header that explains its purposes, inputs outputs and side-effects, and an easy-to-read outline of the steps that occur in processing
Amen to that!
> if it is less than 20 lines, it's probably too small
I would have to disagree with that. Sometimes it makes a lot of sense to have a small function to group some process (or probably partial process) in a way that makes it more logical and readable, for example a small block of stuff that appears in two loops in a bigger function. Modern compiler's automatic in-lining means a lot of this can actually increase efficiency!
You obviously never write code that implements standards or interacts with the outside world.
Of course it shouldn't be necessary to write a comment to explain what a line of code is going to do, but I often find it necessary to write comments explaining why it is doing it. I may need to refer to a standards document or another source that I used when writing the code. Sometimes it's one sentence in an entire standard or reference manual that's important - what should I do, hope the future me remembers or that someone else is psychic enough to find that reference?
Far better is to write code that is self documenting.
Even better is to write tests for that code.
...- if it is less than 20 lines, it's probably too small, more than 60 definitely too big...
And this assumes no compiler bugs.
20 years ago, I managed to write a program that crashed the Pascal compiler on a dual DPS-90 running CP-6. The functions were so short that they could be compiled into object code faster than the compiler's internal object-naming routine could generate internal names. This caused two functions to have the same name, which would cause the linker to crash.
Identifiers should be succinct andNotTediouselyLongWinded. Presumably you are using a language with namespaces and so the enclosing namespace brings contect that need not be repeated in a choice of variable name.
On comments, I too hate _bad_ comments. Learn how to write good comments because sometimes they are necessary - for example, after you have got a good implementation that gives the right result, you may need to alter it for, for example, speed or memory optimizations. These optimizations may not allow the luxury of being in their own function and are going to be obtuse.