back to article Ghost of DEC Alpha is why Windows is rubbish at file compression

Microsoft's made an interesting confession: Windows file compression is rubbish because the operating system once supported Digital Equipment Corporation's (DEC's) Alpha CPU. Alpha was a 64-bit RISC architecture that DEC developed as the successor to its VAX platform. DEC spent much of the late 1990s touting Alpha as a step …

  1. FF22

    Obvious bull

    " Which is a fine way to match compression to a machine's capabilities, but a lousy way to make data portable because if a system only has access to algorithm Y, data created on an algorithm-X-using machine won't be readable. Which could make it impossible to move drives between machines."

    Right. Because you couldn't have possibly included (de)compression code for both algorithms in all versions of the OS, and you couldn't have possibly used an extra bit or byte in the volume descriptor or in the directory entries of files to signal which particular method was used to compress them.

    1. Ole Juul

      Re: Obvious bull

      And that effort, writes Microsoftie Raymond Chen, is why Windows file compression remains feeble.

      Perhaps he should have added "for given values of why".

    2. Kernel

      Re: Obvious bull

      "Right. Because you couldn't have possibly included (de)compression code for both algorithms in all versions of the OS"

      No, wrong - because nowhere has it been stated there would only be two versions of the algorithm required needed - it should be quite obvious to most people that X and Y are examples to keep things simple and that in reality each cpu architecture would require its own algorithm - and at the time there was more than two architectures in play.

      1. Adam 1

        Re: Obvious bull

        Let's not confuse algorithm and file format. The language used seems very loose to me. The algorithms are simply the methodology taken to transform one byte stream to another. It stands to reason that different architectures will be better at some algorithms than others because of the various sizes of caches and buses involved. Some lend themselves to larger dictionaries and better parallelism than others. There's no reason other than priorities as to why they haven't switched to something more suited to x86 in newer versions.

    3. Ken Hagan Gold badge

      Re: Obvious bull

      "Because you couldn't have possibly included (de)compression code for both algorithms in all versions of the OS"

      Right, and if MS had delivered disc compression that meant that discs written on one system would totally suck on performance when plugged into another system, no-one on these forums would have written long rants on how this epitomised MS's cluelessness about "portability".

    4. richardcox13

      Re: Obvious bull

      > Right. Because you couldn't have possibly included (de)compression

      This is covered, but it is assumed know compressed files are updatable (this is another restriuction of on the compression algorithm: you need to be able to change parts of the file without re-writing and compressing the whole thing).

      So one scenario is file is created on a x86 box and then updated on an Alpha box, So the Alpha system has to be able to compress in the same way, while still meeting to meet the performance criteria.

      1. Charlie Clark Silver badge

        Re: Obvious bull

        So one scenario is file is created on a x86 box and then updated on an Alpha box, So the Alpha system has to be able to compress in the same way, while still meeting to meet the performance criteria.

        Why would that matter? If the OS is doing the compression then it will present the compressed file to any external programs as uncompressed. Another box accessing the file over the network would see it as uncompressed (network compression technology is different). And this still doesn't really back up the fairly flimsy assertion that Alpha's weren't up to the job. I certainly don't remember this being an issue when the architectures were being compared back in the day. It's certainly not a RISC/CISC issue. At least not if the algorithm is being correctly implemented. But I seem to remember that many of the performance "improvements" (display, networking, printing) in NT 4 were done especially for x86 chips which suck at context-switching. Coincidentally NT 4 was seen as MS abandoning any pretence of writing a single OS for many different architectures.

        I'm not a whizz at compression but I think the common approach now is to use a container approach as opposed to trying to manage compression on individual files in place.

    5. Mage Silver badge

      Re: Obvious bull

      Yes, given that that there was other CPU specific code and HAL.

      Actually too, the original NT was for 32 bit Alpha. The 64Bit NT4.0 was for Alpha 64 only and came out later after regular Alpha 32 NT4.0. I never saw the 64bit Alpha of Win2K, though I may have existed, it's not on my MSDN collection.

      1. joeldillon

        Re: Obvious bull

        Well, uh. The Alpha (like the Itanium) was a pure 64 bit chip. There was no such thing as a 32 bit Alpha. A 32 bit build of NT for the Alpha doing the equivalent of the x32 ABI, maybe...

      2. Nate Amsden

        Re: Obvious bull

        Maybe I am wrong but I recall original NT being for... i98x cpus or something like that (not x86 and not alpha). X86 alpha mips ppc were added later.

        I used NT 3.51 and 4 on my desktop(x86) for a few years before switching to linux in 1998.

        The article doesn't seem to mention who actually uses ntfs compression. I've only seen it used in cases of emergency where you need some quick disk space. I seem to recall patch rollbacks are stored compressed too (I always tell explorer to show compressed in different color)

        Even back when I had NT4 servers 15 years ago, never used compression on em. Saw a spam recently for Diskeeper, brought back some memories.

        1. patrickstar

          Re: Obvious bull

          i960, code-name "N-10"

          Hence Windows NT as in 'N-Ten' (or rather NT OS/2 as in 'OS/2 for N-10' but that's another story...).

    6. Rob Moir

      Re: Obvious bull

      Because you couldn't have possibly included (de)compression code for both algorithms in all versions of the OS, and you couldn't have possibly used an extra bit or byte in the volume descriptor or in the directory entries of files to signal which particular method was used to compress them.

      And get more complaints about "Windows bloat"? Especially when there's more than one CPU instruction set involved. Windows NT supported x86, PowerPC and Alpha around the NT 3.51/NT4 days iirc, and developments of that system subsequently went to to Itanium, x86-64 and ARM (windows RT)

    7. Shaha Alam

      Re: Obvious bull

      hard disks were a lot smaller back then so some sacrifices had to be made for the sake of practicality.

      there were already complaints about bloat in windows.

    8. Planty Bronze badge
      WTF?

      Re: Obvious bull

      I wonder what Microsoft's excuse is for their abysmal Explorer zip support?

      I mean why does Windows Explorer take 30 minutes to unzip a file that 7Zip can manage in 30 seconds??

      Perhaps they have some lame excuse for that prepared?

      1. smot

        Re: Obvious bull

        "I mean why does Windows Explorer take 30 minutes to unzip a file that 7Zip can manage in 30 seconds??"

        And why does it unzip first into a temp folder then copy into the destination? What's wrong with putting it straight into the target? Saves space and time and avoids yet more crud in the temp folder.

    9. TheVogon

      Re: Obvious bull

      File system compression usually isn't the way to go these days as disk is relatively cheap and because it can have a high overhead.

      At the SME / enterprise level where storage savings can stack up, Thin provisioning and maybe deduplication is often more what you need...

      1. Eddy Ito

        Re: Obvious bull

        I think that's a point of confusion here. Chen was talking about "file system compression" but the article repeatedly says "file compression" which is a very different beast. As written, it makes it sound as if LZMA may only work on some platforms which is silly.

      2. Robert Carnegie Silver badge

        Re: What's cheap

        Spinning rust is cheap, network bandwidth may not be, SSD certainly isn't cheap.

        One recent Windows clever idea is to supply the entire operating system pre-compressed. Much space saved.

        Your monthly patches, however, aren't compressed. So the disk fills up with operating system files anyway.

        Maybe they will get around that by reinstalling the entire operating system from time to time, but calling it an update. Or maybe they already have.

        1. Ian 55

          Re: What's cheap

          "One recent Windows clever idea is to supply the entire operating system pre-compressed. Much space saved."

          You mean like Linux 'live CDs' for well over a decade?

          1. Anonymous Coward
            Anonymous Coward

            Re: entire operating system pre-compressed. Much space saved

            And in 1999, you could get a compressed OS, drivers, GUI, and browser. On a 1.44MB floppy.

            http://toastytech.com/guis/qnxdemo.html

            Try telling that to the young people of today [etc].

      3. AndrueC Silver badge
        Boffin

        Re: Obvious bull

        File system compression usually isn't the way to go these days as disk is relatively cheap and because it can have a high overhead.

        Ah but in some scenarios file compression can improve performance. If you are disk bound rather than CPU bound you can trade spare CPU cycles for I/O instructions. I do that with my source code because Visual Studio is usually held up by the disk more than the CPU. And the NTFS compression algorithm is actually byte-based, not nibble based. It's almost ideally suited to source code.

        1. Anonymous Coward
          Anonymous Coward

          "file compression can improve performance"

          But then you would do it at the application level because it does know the internal file format and can use the best compression strategy - i.e. a database engine which does compress its pages.

          A generic compression strategy probably would not help much but for sequential access - like reading a text file for display, and rewriting it fully for any change. If you need more random and granular access, you need to move the compression to the application.

    10. Anonymous Coward
      Anonymous Coward

      Re: Obvious bull

      Right. Because you couldn't have possibly included (de)compression code for both algorithms in all versions of the OS, and you couldn't have possibly used an extra bit or byte in the volume descriptor or in the directory entries of files to signal which particular method was used to compress them.

      Yes, that must be why I can't use Zip or Rar files on a different PC than the one that created them.

      1. Anonymous Coward
        Anonymous Coward

        Re: Obvious bull

        Read the first article of that series (there's a link in the Chen post). On-the-fly file/filesystem compression needs to works differently from an "archive" format like zip. You have time constraints, and you need to be able to read/write blocks of the file without the need to decompress and then re-compress it fully. Try random access to a zip (or rar) compressed file...

      2. toughluck

        Re: Obvious bull

        Yes, that must be why I can't use Zip or Rar files on a different PC than the one that created them.

        Stop using floppy disks.

    11. patrickstar

      Re: Obvious bull

      Funny how none of the assume-everything-MS-does-is-crap crowd has actually read the article. Where it is clearly explained that, yes, it supports multiple formats, but also that certain performance requirements had to be met by all of them on all architectures.

      Reason having more to do with bounds on total time spent in the kernel when reading/writing data rather than perceived performance. This is not a design decision you can reasonably comment on without having a full view of the actual, intended, and possible uses of NT on various architectures in various configurations at that time - which you don't, and MS presumably did.

      1. toughluck

        Re: Obvious bull

        @patrickstar:

        Funny how none of the assume-everything-MS-does-is-crap crowd has actually read the article.(...)

        This is not a design decision you can reasonably comment on without having a full view of the actual, intended, and possible uses of NT on various architectures in various configurations at that time - which you don't, and MS presumably did.

        Well, assume everything MS does is crap and you arrive at the conclusion that they also didn't have that full view.

  2. Anonymous Coward
    Anonymous Coward

    The biggest problem with the alpha chip was yield

    I have worked with production engineers who worked on the Alpha chip. 35% on a good day is not cost effective.

    1. Arthur the cat Silver badge

      Re: The biggest problem with the alpha chip was yield

      As I recall, the first Alpha chips also had a bottleneck in the memory access pathways. Once the data and code had been loaded into cache it went blindingly fast (for the time), but filling the caches or writing dirty caches back ran like an arthritic three legged donkey.

      1. Anonymous Coward
        Anonymous Coward

        Re: bottleneck in the memory access pathways.

        "the first Alpha chips also had a bottleneck in the memory access pathways. Once the data and code had been loaded into cache it went blindingly fast (for the time),"

        Citation welcome. See e.g. McCalpin streams memory performance benchmark(s) from the era.

        Bear in mind also that in general, early Alpha system designs (with the exception of 21066/21068, see below) could be based on 64bit-wide path to memory, or 128bit. 128bit was generally faster. The presence or absence of ECC in the main memory design could also affect memory performance.

        You *may* be thinking of the 21066/21068 chips. These were almost what would now be called a "system on chip" - a 21064 first-generation Alpha core (as you say, blindingly fast for its time), and pretty much everything else needed for a PC of the era ("northbridge","southbridge", junk IO), all on one passively-coolable 166MHz or 233MHz chip. Just add DRAM. Even included on-chip VGA. In 1994. I'll say that again: in 1994.

        Unfortunately it had a seriously bandwidth-constrained DRAM interface, which was a shame.

        The 21066/21068 were used in a couple of models of DEC VMEbus boards and the DEC Multia ultra-small Windows NT desktop, which was later sold as the "universal desktop box", because it could run NT (supported), OpenVMS (worked but unsupported), or Linux (customer chooses supported or not). The 21066/21068 weren't used in any close-to-mainstream Alpha systems, not least because of the performance issues.

        Alternatively, someone may be (mis?)remembering that early Alpha chips also didn't directly support byte write instructions and the associated memory interface logic, which meant that modifying one byte in a word was a read/modify/rewrite operation. I wonder if this is confusing the picture here.

        The compilers knew how to hide this byte-size operation, but in code that used a lot of byte writes, the impact was sometimes visible, which was why hardware support was added quite quickly to the next generation of Alpha designs, and DEC's own compilers were changed to make the hardware support accessible).

        As for Mr Chen's original Alpha-related comments: (a) it's hearsay (b) it's somewhat short of logic (at least re hardware constraints), and even shorter on hardware facts.

        References include:

        Alpha Architecture Handbook (generic architecture handbook, freely downloadable) see e,g,

        https://www.cs.arizona.edu/projects/alto/Doc/local/alphahb2.pdf

        Alpha Architecture Reference Manuals (2nd edition or later, if you want the byte-oriented stuff)

        Digital Technical Journal article on the 21066:

        http://www.hpl.hp.com/hpjournal/dtj/vol6num1/vol6num1art5.pdf

    2. cageordie

      Re: The biggest problem with the alpha chip was yield

      Take a look at the early days of any technology and yield is poor. Right now I work on a next generation Flash architecture, almost nothing about it works. It's incredibly poor compared to current Flash. But next year it will be in the stores. That's just how development goes at the cutting edge, and that's what the Alpha once was.

      1. toughluck

        Re: The biggest problem with the alpha chip was yield

        @cageordie: It depends. These days everyone sort of expects a new node to just work and to get 20%+ yields even from the first wafers. And 35% isn't bad as far as yields go.

        Plus, even cutting edge process technology doesn't always mean starting with nothing, although it depends on the specific design.

        Compare AMD's 4770 and Evergreen series to Nvidia's Fermi.

        AMD first did a midrange chip to learn and understand TSMC's 40 nm process node and got good yields before going for larger chips.

        Nvidia decided to go for a large chip as their first 40 nm design and got lousy yield, basically all of their GF100 chips effectively yielded 0% because they needed to fuse off portions of each manufactured chip. Starting with a large chip meant it was harder for them to understand the process, and the problem repeated when working on smaller GF104/106/108 chips. It still didn't bring them down at all.

      2. Anonymous Coward
        Anonymous Coward

        Re: The biggest problem with the alpha chip was yield

        "That's just how development goes at the cutting edge, and that's what the Alpha once was."

        Correct. As I mentioned earlier, the 21066/21068 "Low Cost Alpha" chips had a disappointing memory interface performance. One of the reaons for that was packaging, Chip and board designers hadn't got round to surface mount chips with high pin density (this was in the early Pentium era) and to keep costs down the package had to be small, which kept the number of pins unrealistically low.

        Today if someone were to try a similar trick with SMD packaging, there'd be no priblem. Heck every smartphone for years has used SMD packaging on its SOC. But back then it wasn't an option.

  3. martinusher Silver badge

    This explanation explains everything

    There's only one way to describe this explanation about software design -- the programmers are crap, they haven't a clue how to layer software design. But then we more or less guessed this might be the case.

    Incidentally, the x86 isn't a particularly efficient architecture, its just very well developed so it can 'brute force' its way to performance. A wide machine word is also meaningless because unlike more efficient architectures x86 system are fundamentally 8 bit (16 if you want to be generous) so support misaligned / cross boundary code and data accesses.

    1. Brewster's Angle Grinder Silver badge

      Re: This explanation explains everything

      These days, cross boundary access is a performance boost: it means you can pack data structures tighter so the data is more likely to be in the cache.

    2. patrickstar

      Re: This explanation explains everything

      There is only one explanation for your comment - you didn't read the actual, linked, article before posting it.

      Also that you have no experience with the design or implementation of the NT (Windows) kernel, subsystems, drivers or filesystem. As they are all very well layered in precisely the way you just claimed the authors were unable to.

      1. AndrueC Silver badge
        Boffin

        Re: This explanation explains everything

        At first sight maybe almost too well layered. When I first read Inside Windows NT many years ago I was amazed how modular things were and how object oriented the kernel is. In fact as a young software developer in the early 00s I kind of thought that was why NT performance was a bit poor. It felt to me like too much overhead and baggage for an OS.

        A common discussion around that time was whether C++ sacrificed performance compared to C because of it's OOP features. And here was an operating system that used encapsulation, had methods and even ACLs for internal data structures.

        Quite an eye opener for someone used to working with MSDOS :)

  4. Nicko

    Thanks for the memories...

    Still got an AXP 433 NT4 workstation in my workshop loft somewhere - in its day, it was an absolute beast. ISTR that MS had a whole building on DEC's campus (or was it the other way round), just to make sure that the Windows builds could be processed quickly - forget not that Dave Cutler, one of the (lead) progenitors of NT, OpenVMS (or just plain "VMS" [previously "Starlet"] as it was at the start) and was heavily involved in the Prism project, one of several that eventually led to the development of the AXP. He also drove the NT port to AXP and later, to AMD architectures.

    He got around a bit. The only downside was the truly abysmal DEC C compiler, which he co-authored. I had to use that for a while and it was a dog - Cutler admitted none of them had written a compiler before, and it showed.

    Anyone want a lightly-used AXP-433 - was working when last used in about 1999...

    1. Anonymous Coward
      Anonymous Coward

      Re: Thanks for the memories...

      We had a dual-boot box (VMS and NT) and I still have it (wrapped in a nice plastic bag). Having lost the password to the latter and no media, I overwrote it with NetBSD. One day, I will reactivate my Hobbyist llicence for OpenVMS...

  5. Anonymous Coward
    Anonymous Coward

    So why not create a new v2 compression scheme?

    How often are you going to remove an NTFS drive from a Windows machine and try to install it in an older Windows machine? Push out patches for all supported versions of Windows (7+, Server 2008+) to understand the v2 compression, provide an option to force usage of v1 if you want to be sure you can remove the drive and put it in an outdated Windows machine and problem solved.

    If Microsoft thought this way with everything they'd still be defaulting to the original 8.3 FAT filesystem...

    1. Adam 1

      Re: So why not create a new v2 compression scheme?

      When you burn a CD you get to choose whether to support multiple sessions on the disc to allow subsequent changes or whether to burn as a single finalised session for compatibility.

      Very good compression with ultra low CPU overhead algorithms exist. The only reason I can see for wanting to avoid it would be for more efficient deduping.

    2. Ken Hagan Gold badge

      Re: So why not create a new v2 compression scheme?

      The "v2" question is addressed in the blog and the answer is quite simple: who gives a flying fuck about disc compression these days? You queue up the I/O and wait for DMA to deliver the data and then re-schedule the thread. Meanwhile, there's 101 other things the CPU can be doing.

      Back in the 80s and 90s it probably meant something because: (i) there weren't 1001 other threads in the waiting queue, and (ii) the disc access probably wasn't a simple case of "send a few bytes to the controller and wait". Both factors meant that the CPU was probably kicking its heels whilst the I/O happened, so reading less data and burning the wasted cycles on decompression was a win.

      These days, file-system compression is just making work for yourself (the compression) so that you can force yourself to do other work (the decompression) later.

      1. Roger Varley

        Re: So why not create a new v2 compression scheme?

        and not to forget the eye watering cost of hard drives in those days as well

        1. Wensleydale Cheese

          Re: So why not create a new v2 compression scheme?

          "and not to forget the eye watering cost of hard drives in those days as well"

          Let's not also forget that the other main argument for disk compression back in the 90s was that less physical I/O was required to access compressed files.

          This was a pretty powerful argument for those of us who weren't particularly short of disk space.

          A by product on NTFS was reduced fragmentation.

      2. Adam 52 Silver badge

        Re: So why not create a new v2 compression scheme?

        "You queue up the I/O and wait for DMA to deliver the data and then re-schedule the thread. Meanwhile, there's 101 other things the CPU can be doing"

        For an interactive system "wait" means poor user experience.

        Processor performance has improved much faster than disc IO. A modern system spends huge amounts of time hammering the disc - that's why we all upgrade to SSDs whenever we get the chance.

        1. Anonymous Coward
          Anonymous Coward

          For an interactive system "wait" means poor user experience.

          Yes, when the main thread blocks waiting for I/O to complete. That's why async I/O, queues, completion ports, and the like are welcome - you say the OS "read/write this data, and notify me when you did, meanwhile I can still interact with the user, for a good experience"

      3. Loud Speaker

        Re: So why not create a new v2 compression scheme?

        Both factors meant that the CPU was probably kicking its heels whilst the I/O happened, so reading less data and burning the wasted cycles on decompression was a win.

        This was, of course, true of Windows up to at least XP.

        It was almost certainly not the case with VMS (I used it, but was not familiar with the code). However, Unix was around from before 1978 (when I first met it) and most definitely could do proper DMA transfers and multi-threading of tasks where the hardware permitted. I know: I wrote tape and disk drivers for it (in the 80's).

        And I am bloody sure that Alpha assembler could do bit twiddling with little more pain that x86. I have written assembler for both. C compilers might have been less good at it.

        The Alpha architecture allows you to write and load microcode routines at run time - so you could implement the bit twiddling instructions yourself and load them when your program loaded. Great for implementing database joins, etc. Of course, you have to know what you are doing to write microcode. This might be the real problem.

        1. Wensleydale Cheese

          Re: So why not create a new v2 compression scheme?

          "And I am bloody sure that Alpha assembler could do bit twiddling with little more pain that x86. I have written assembler for both. C compilers might have been less good at it."

          IIRC with the move from VAX to Alpha, the Macro assembler became a compiler rather than a simple assembler.

          Data alignment was also important on Alpha: MSDN article "Alignment Issues on Alpha".

      4. Brewster's Angle Grinder Silver badge

        Re: So why not create a new v2 compression scheme?

        "...the disc access probably wasn't a simple case of "send a few bytes to the controller and wait"..."

        Ah, yes, the joys of "Programmed IO" (PIO) -- "rep insb", "rep insw", "rep outsb" and "rep outsw"; your single threaded processor would be tied up shunting sectors through a port, byte by byte.

    3. Mage Silver badge

      Re: So why not create a new v2 compression scheme?

      Indeed. NTFS itself has already different versions. I think NT3.51 can't read an XP / Win 2K format. I forget when the major change was.

      1. Hans 1
        Windows

        Re: So why not create a new v2 compression scheme?

        >Indeed. NTFS itself has already different versions. I think NT3.51 can't read an XP / Win 2K format. I forget when the major change was.

        NT4 SP4, iirc ... but NTFS changed again in 2k, then again in XP, maybe again in Fista/7, and certainly once, twice or thrice more in 8, 8.1, 10 ... little, incremental, changes ... NT4 SP4 was a BIG change, though ... I would not use NT4 SP6 to read XP formatted NTFS drive, though .... no trust ... a bit like Office ... NT 3.x cannot, NT4 SP3 or lower cannot ...

    4. Anonymous Coward
      Anonymous Coward

      Re: So why not create a new v2 compression scheme?

      How often are you going to remove an NTFS drive from a Windows machine and try to install it in an older Windows machine?

      Isn't Windows riddled with DRM 'tilt bits' to stop this sort of 'own your own data' nonsense from ever happening?

      1. Anonymous Coward
        Anonymous Coward

        Re: So why not create a new v2 compression scheme?

        "Isn't Windows riddled with DRM 'tilt bits' to stop this sort of 'own your own data' nonsense from ever happening?"

        Something along those lines. It's what MS mean by "Trustworthy Computing": high value content must be protected everywhere on the path between the content rights owner and the viewer/listener.

  6. jonnycando
    Mushroom

    Hmmm

    Windows 2.11 anyone? It seems that's where entropy Microsoft style is going to take us.

  7. HCV

    One Step Beyond?

    "...touting Alpha as a step beyond anything else on the market at the time. It was mostly right: x86-64 didn't arrive until the year 2000."

    Er, what? Even if you define "the market" as being "the market for processors desperately hoping to split the 'Wintel' atom", that still isn't true.

    1. foxyshadis

      Re: One Step Beyond?

      The first date in the Wikipedia article is 2000, let's go with that.

      Oh, the Opteron didn't come out until 2003? Eh, shrug.

    2. allthecoolshortnamesweretaken

      Re: One Step Beyond?

      One Step Beyond!

  8. Oh Homer
    Linux

    Yet another thing Microsoft sucks at

    Even the humble Amiga had transparent (using magic-number style datatype handlers) and system-agnostic (multiple Amiga OS and CPU versions) compression in the form of the XPK, XFD and XAD libraries, with a plugin architecture that supported multiple compression and even encryption algorithms (including arch specific variants).

    Speaking of the Amiga, did you know that Microsoft nicked their CAB compression tech from LZX, a once popular replacement for the LHA format so ubiquitous in the Amiga world?

    This is the main problem with Microsoft. In the words of Arno Edelmann, Microsoft's then European business security product manager; "Usually Microsoft doesn't develop products, we buy products".

    What Edelmann failed to mention, but which we can easily deduce from decades of experience, is that Microsoft consequently has no idea what to do with these assimilated products, and just sticks them together with duct tape, spray paints a Microsoft logo on them, then crosses its fingers.

    To really grasp why Microsoft is so technically inept, you need to understand that it isn't actually a software development company, it's just a sales company, and salesmen make lousy software engineers.

    1. itzman
      IT Angle

      Re: Yet another thing Microsoft sucks at

      To really grasp why Microsoft is so technically inept, you need to understand that it isn't actually a software development company, it's just a sales company, and salesmen make lousy software engineers.

      Or, as we used to say 'Designed to sell, not to work'

      1. sabroni Silver badge

        Re: it's just a sales company

        As opposed to all the companies that don't sell anything like.....?

    2. Nick Ryan Silver badge

      Re: Yet another thing Microsoft sucks at

      Microsoft also had code supporting specific features of the Amiga hardware in rather earlier versions of their OS. Annoyingly I couldn't find it the last time I looked for it (I found it by accident originally) but it was definitely there somewhere buried in the depths of the graphics handling code.

      1. Anonymous Coward
        Anonymous Coward

        Re: Yet another thing Microsoft sucks at

        As I recall, it had to do with the differences between Amiga's bit-plane orientation vs. the god-forsaken way the IBM-derived way of handling graphics. Theoretically, I still have a copy in storage, if they haven't bit-rotted away. I stumbled across all sorts of tidbits concerning other platforms algorithms hidden in the MSDN discs. Probably not intentionally hidden, just required try and try again in their sucky search.

  9. Alfie Noakes
    FAIL

    "Chen says one of his “now-retired colleagues worked on real-time compression, and he told me that the Alpha AXP processor was very weak on bit-twiddling instructions".

    I guess that Chen doesn't know what the "R" is "RISC" means!

    mb

    1. Ken Hagan Gold badge

      Really? I'd say that Chen has a pretty good idea of what it means and explains it clearly in the blog. In this case, it means that the processor was very weak on bit-twiddling instructions.

      "Reduced" implies that you've taken a broad view of what's important and prioritised that stuff, so there will inevitably be niches that you aren't serving with dedicated instructions. Here, apparently, is the niche that the Alpha's designers chose not to serve. There's no blame or moral opprobrium attached to it -- it was simply an engineering trade-off.

    2. Peter Gathercole Silver badge

      byte, word and longword addressing

      The earlier 'classic' Alpha processors (before EV56) did not support byte or word boundary aligned reads and writes from main memory. In order to read just a byte, it was necessary to read the entire long-word (32 bits), and then mask and shift the relevant bits from the long-word to get the individual byte. This can make the equivalent of a single load instruction from other architectures a sequence of a load, followed by a logical AND, followed by a shift operation, with some additional crap to determine the mask and the number of bits to shift.

      But you have to remember that in the space of a single instruction on an x86 processor, an Alpha could probably be performing 4-6 instructions (just a guess, but most Alpha instructions executed in 1 or 2 clock cycles compared to 4 or more on x86, and they were clocked significantly faster than the Intel processors of the time - RISC vs. CISC).

      Writing individual bytes was somewhat more complicated!

      I was told that this also seriously hampered the way that X11 had to be ported, because many of the algorithms to manipulate pixmaps relied on reading and writing individual bytes on low colour depth pixmaps.

      1. Anonymous Coward
        Anonymous Coward

        Re: byte, word and longword addressing

        Byte handling was much improved in the later Alpha processors - another case of a big company getting it wrong first time around.

  10. foxyshadis

    I'm not sure that calling it rubbish is even all that accurate -- it's not bad for what it is, and competing modern options like LZO and LZ4 aren't much better, they're mostly just faster. It's annoying that they didn't include both a fast and slow compression, like they did with cabinets and wim, but I understand that they solved the 90% problem and going beyond that would just mean new UI work and lots more testing.

    What's rubbish is that fact that it's not used by default on all installations since 2004 or so, by which point the disparity between CPU overhead and reading from disk had become completely absurd and file compression was rock-solid. Every OS since XP SP2 should have made it mandatory; it basically halves the overhead of OS and program installs, and is like a little extra space for everything else.

    1. psyq

      Actually, most of the data stored on the disc is probably already compressed (multimedia, etc.) and OS would simply waste CPU cycles with another layer of, in that case, pointless - or even counterproductive (as in: more data needed to store what OS sees as random sequence of bytes) compression.

      File-system level compression might had its day when a) the prices of storage were much higher and b) where most files stored on the medium were compressible. Those were the days of DoubleSpace / Stacker and similar products from the 1990s.

      Now, it could be still useful for compressing the OS / application executable binaries but the gains to be got from that are simply diminishing (say, you get 30% compression - how much does that help as opposed to increased CPU loads / latency due to on-the-fly compression/decompression)?

      Having file system compression as a feature can definitely be useful but having it ON by default? Definitely not. For most typical usage patterns where most of the I/O would be actual application data, not binaries, it would just waste CPU cycles trying to compress something already compressed.

      1. foxyshadis

        Back in 2003, when I first made the attempt to offline compress the OS, it was an absolute night-and-day performance difference in startup and daily use, thanks to how crappy hard drives of the day were. I didn't say on the full disk, I just said to use it by default; Microsoft could have improved almost everyone's experience for little effort, even if it was only for the Windows and Program Files folders.

        Now I have an SSD, and only enable compression on disk images and the OS to fit a little more until I can upgrade it. Performance difference is pretty much zero, when I've benchmarked, because the overhead of compression was designed to be low for 20-year-old CPUs -- it's undetectable now. (Unless you force LZX mode, which I'm too lazy to.) Sure, the SSD itself would compress for performance purposes, but it won't actually give you back any of that extra space.

        For the external mass-storage disk, of course, there's little point in bothering.

        The days of resource constraints that can be relieved by workarounds aren't behind us for everyone just yet.

  11. Oh Homer
    Headmaster

    Re: "chose not to serve"

    The whole point of CISC was to save memory. It was never anything but a kludge, and these days a wholly unnecessary one. In retrospect it also seems like a crutch for lazy programmers. Mostly it's just a circular-dependent legacy we're stuck with, like the Windows ecosystem.

    1. Peter Gathercole Silver badge

      Re: "chose not to serve" @Oh Homer

      CISC processors predated the adoption of the terms CISC and RISC. While you could say that, for example, a 6502 microprocessor was an early RISC processor, it was not really the case. The first processor that was really called a RISC processor was probably the Berkley RISC project (or maybe the Stanford MIPS project), which pretty much branded all previous processors as CISC, a term invented to allow differentiation.

      As a result, you can't really claim any sort of design ethos for a CISC processor. Saving memory was a factor, but I don't really think that it was important, otherwise they would not have included 4 bit aligned BCD arithmetic instructions, because these wasted 3/8ths of the storage when storing decimal numbers.

      You can say the converse. RISC processors, especially 64 bit processors often sacrificed memory efficiency to allow them to be clocked faster.

      1. Anonymous Coward
        Anonymous Coward

        "they would not have included 4 bit aligned BCD arithmetic instructions"

        IIRC "packed" BCD instructions had performance tradefoffs and limitations compared to the unpacked ones - i.e. multiplication and divisionis supported only for the latter type.

      2. Oh Homer

        Re: "chose not to serve" @Oh Homer

        I was referring to its evolution rather than its inception, an evolution that mainly Intel took to perverse extremes.

    2. Loud Speaker

      Re: "chose not to serve"

      The whole point of CISC was to save memory. No, No, a thousand times NO

      The whole point of CISC is to save memory bandwidth

      The first real CISC was the PDP11 - and you can see it clearly in the instruction set - the object is to fetch as few bytes of instruction from memory and then do as much as possible before you have to do it again! It doesn't matter a toss how fast your instruction cycles are if your CPU is sitting there waiting for the loads and stores to complete so it can fetch another instruction.

      Sure you can make the problem harder to identify with gloriously complicated caching and pipelining, but the physics remains the same.

      CISC was the winner because main memory is DRAM (slow as hell) while the CPU's internals are static RAM - massively faster - and the communications between them are a serious bottleneck. L1 and L2 cache help - a bit - but "ye cannae break the laws of physics, Captain".

      1. Peter Gathercole Silver badge

        Re: "chose not to serve" @Loud Speaker

        That's a very interesting point, one I had not thought about, but the term CISC actually refers to a Complex Instruction Set Computer, and is defined by the number of instructions in the set, and the number of addressing modes that the instructions can use. I would say that the memory bandwidth savings were secondary, especially as most early computers processor and memory were synchronous.

        I'm not sure that I totally agree with the definition of a PDP11 as a CISC (although it was certainly several generations before RISC was adopted), but the instruction set was quite small, and the number of addressing modes was not massive and exceptionally orthogonal, so it does not really fit in to the large instruction set many addressing modes definition of a CISC processor.

        What made the PDP11 instruction set so small was the fact that the specialist instructions for accessing such things as the stack pointer and the program counter were actually just instances of the general register instructions, so were really just aliases for the other instructions (you actually did not get to appreciate this unless you started to look at the generated machine code). In addition, a number of the instructions only used 8 bits of the 16 bit word, which allowed the other 8 bits to be used as a byte offset to some of the indexing instructions (contributing to your point about reducing memory bandwidth).

        One other feature that was often quoted, but was not true of most early RISC processors was that they execute a majority of their instructions in a single clock cycle. This is/was not actually part of the definition (unless you were from IBM who tried to redefine RISC as Reduced Instruction-cycle Set Computer or some similar nonsense), although it was an aspiration for the early RISC chip designers. Of course, now they are super-scalar, and overlap instructions in a single clock cycle and execution unit, that is irrelevant.

        Nowadays, it's ironic that IBM POWER, one of the few remaining RISC processors on the market actually has a HUGE instruction set, and more addressing modes than you can shake a stick at, and also that the Intel "CISC" processors have RISC cores that are heavily microcoded!

  12. Anonymous Coward
    Anonymous Coward

    Alpha wasis a 64-bit RISC architecture

    You'd be amazed at how many DEC Alpha machines still run critical elements of the UK's infrastructure.

    Anonymous as I probably shouldn't say that out loud...

    1. Flocke Kroes Silver badge

      Diversity not completely dead

      6 Alphas show up on Debian Popcon. That puts Alpha head of M68k and S390, but behind prehistoric versions of ARM and MIPS.

      1. Anonymous Coward
        Anonymous Coward

        Re: Diversity not completely dead

        Seeing as Debian dropped support for Alpha, are you really that surprised?

        I image most are running Tru64, OpenVMS, or NetBSD.

  13. Anonymous Coward
    Windows

    Obvious bull, redux

    I always imaging Microsoft's development approach to be akin to a herd of cows kept in the same field for years who all learn to shit under the same tree.

    1. Anonymous Coward
      Anonymous Coward

      Horshit!

      "I always imaging Microsoft's development approach to be akin to a herd of cows kept in the same field for years who all learn to shit under the same tree."

      Correction from a rural correspondent:

      Horses will pick a spot in a field to shit in and use that. Cows don't care where they shit.

  14. Anonymous Coward
    Anonymous Coward

    Did they track the various revisions to the Alpha instruction set as newer processors were released, or was the whole thing a just-in-case marketing exercise to make Windows look more attractive to cross-platform developers while giving MS another strand to their bow when negotiating with Intel ?

    1. foxyshadis

      There were never any big instruction set changes to the Alpha, once it was done it was done, later revisions just sped up the chips. DEC/Compaq fronted most of the money and half the engineering to make it happen, because their customers wanted it. It was far more than a marketing ploy, but once Compaq threw in the towel, there was no way Microsoft was going to shoulder all of the burden.

      The speed challenges were always more about the crappy compiler, anyway; Microsoft's Alpha C compiler was worse than UNIX ones, and much worse than its x86 compiler. (Which if you've used VC6, is saying quite a bit!)

  15. allthecoolshortnamesweretaken

    "But if nothing else it's nice to know that DEC Alpha lives on, albeit as an irritant, deep within the bowels of Windows."

    Can't be the only one.

  16. Anonymous Coward
    Anonymous Coward

    More proof the recent "complete re-write" was a lie

    Windows bug ridden old code with a new frock every few years.

    1. patrickstar

      Re: More proof the recent "complete re-write" was a lie

      There has been no claim of a complete rewrite of Windows, ever. Or reason to - on a kernel level it's excellent (userland not so though...). Biggest changes were going from 2003/XP kernel to Vista/7, but this was mainly about improving scalability on large amounts of CPU cores.

  17. bed

    HAL

    I could be wrong, but, I seem to remember that up to and including NT3.51 there was a Hardware Abstraction Layer between the OS and the CPU which mean the same code ran, slowly, on a variety of hardware. NT4 removed that and moved the GUI layer and only ran on x86 and Alpha - no longer MIPS. Quite a long time ago when we had a couple of 433 workstations and an Alpha server. Eventually ran Linux on one of the 433 boxes - went like stink!

    1. Anonymous Coward
      Anonymous Coward

      Re: HAL

      No. HAL is not a virtual machine which runs an intermediate code identical for each platform. HAL, kernel and user code are all compiled to the native CPU code of the target processors. The scope of HAL is to present a coherent API to the kernel for hardware access (i.e. physical memory management), thereby the kernel becomes more "hardware agnostic" and the OS doesn't require a specific kernel for each supported platform, only a specific HAL.

      HAL is still present in Windows 10, AFAIK. Even on x86, for example Windows used different HALs for single and multiprocessor/core systems (the former didn't use some code to sync properly access to shared resources),

      The fact that most of the graphic routines code was moved from user space to kernel (kernel, not HAL), has nothing to do with this. It was done because getting to and from the privilege ring 3 and 0 is costly (due to the protection checks, etc. etc.) , and slowed down performance. HAL is part of the kernel and runs at ring 0.

      1. patrickstar

        Re: HAL

        The HAL basically encapsulates the stuff that's machine specific, not architecture specific. The kernel itself is still very much aware of the architecture, so it can do things like set up page tables, handle interrupts, and various other non-portable stuff.

        Each component of the Windows kernel (Executive, Memory Manager, etc) has its own directory in the kernel source tree, and then for those that need it there are sub-directories for each architecture (i386, amd64, etc). Plus a small sprinkling of #ifdef's in the common code for special cases, though they are mostly about 32 vs 64 bit.

        For x86 systems, the HAL is somewhat of a leftover from the days before the whole "PC" thing was decently standardized and there were lots of vendor-specific hardware in the box. In more recent times the big use-case was to support both ACPI/non-ACPI and APIC/PIC systems.

        See https://support.microsoft.com/en-us/kb/309283

        The kernel itself is also aware of uni- vs multi-processor systems, by the way; see https://en.wikipedia.org/wiki/Ntoskrnl.exe . The non-relevant locking is simply #ifdef'd out when compiling a uniprocessor kernel.

  18. Bruce Hoult

    got his wires crossed somewhere

    The article is ridiculous. The Alpha (even the original 21064) was perfectly good at bit-bashing and implementing any compression format you wanted.

    Thanks to Anton Ertl, here are some timings of decompressing a 3 MB deflated file to 10 MB (i.e. gzip, zlib etc) using zcat:

    0.351s 280m clocks 800MHz Alpha 21264B (EV68) 2001

    0.258s 310m clocks 1200MHz ARM Cortex A9 (OMAP4) 2011

    0.224s 240m clocks 1066MHz PPC7447A 2003

    0.116s 230m clocls 2000MHz Opteron 246 2003

    The fastest uses 25% fewer clock cycles than the slowest, and the slowest is not the Alpha.

    Yes, the Alpha is the slowest in absolute time, but it's also the oldest and lowest clock speed. The Alpha was always considerably higher clock speed than its rivals made at the same time, at least until the Pentium 4 (which was high clock speed but not fast per clock).

    What probably happened is Microsoft took some generic compression CODE (not format) that used byte reads and writes and simply re-compiled it for the Alpha, which didn't have byte instructions. It's not hard to rewrite the code to read 32 or 64 bits at a time and then use the (very good!) bit-banging instructions to split them into smaller parts. But that would take someone a week to do. You'd think Microsoft would have the skill and manpower to do that work, but apparently not.

    The resulting code, by the way, would run faster on pretty much every other CPU too.

    1. alysdexia

      Re: got his wires crossed somewhere

      fast -> swift

      hard -> touh

      would -> should

  19. John Styles

    This does seem unusually pitiful in terms of The Register sensationalising Mr Chen's fine blog post then lots of blowhards wibbling away ignoring the point.

  20. P0l0nium

    The DEC Alpha lives on in China ... apparently.

    https://en.wikipedia.org/wiki/Sunway

    Microsoft aren't alone in coding for the worst CPU on the planet :-)

  21. MSmith

    " But that didn't stop DEC faltering and being acquired by Compaq in 1998."

    I thought DEC took Intel to court for stealing the Alpha math co-processor and using it to upgrade the 486 to the Pentium. I thought they allegedly stole the plans when Microsoft gave it to them while they consulted on how to port NT 4.0 to the Alpha. Rather than force Intel to give DEC a few fabs to pay the penalties and royalties on the millions of Pentium, Pentium II, Pentium III's, etc, the judge allowed Intel to buy at least a large stake in DEC. As the new major shareholder of DEC, Intel settled the suit against itself, and then was required to sell off DEC piecemeal to avoid the antitrust implications to Intel. I heard that is what killed DEC.

    1. Anonymous Coward
      Anonymous Coward

      Re: that is what killed DEC.

      Lots of things killed DEC. One day someone will write a worthwhile book on the subject (a relatively well known one already exists, but the things I read in it do not entirely tally with my experience of the company either as an outsider or an insider).

      One of the biggest things responsible for the death of DEC went by the name of Robert "GQ Bob" Palmer. You won't find much written on the subject, but this snippet may be illuminating:

      https://partners.nytimes.com/library/tech/98/09/biztech/articles/10digital.html

      And when he wasn't selling (or giving away) the company's future to competitors in software, he and his successors were doing it with DEC's hardware.

      What the chip and system engineers knew:

      http://www.cs.trinity.edu/~mlewis/CSCI3294-F01/Papers/alpha_ia64.pdf

      What Palmer was doing at the same time was agreeing a ten year deal as part of a court case settlement with Intel [1] to abandon Alpha in favour of the future "industry standard 64bit" architecture, IA64.

      Water under the bridge, sadly.

      [1] https://www.cnet.com/uk/news/intel-digital-settle-suit/

      1. alysdexia

        Re: that is what killed DEC.

        will -> shall

  22. alysdexia

    It's not nescient to know; you are.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like