At least Windows never killed my BIOS...
Of course the Linux apologists will say it's an Intel problem.
Canonical has halted downloads of Ubuntu Linux 17.10, aka Artful Aardvark, from its website after punters complained installing the open-source OS on laptops knackered the machines. Specifically, the desktop flavor of Artful Aardvark, released in October, has been temporarily pulled – the server builds and other editions …
These machines are not really permanently borked. It is possible to reflash them, which restores normal BIOS functionality. The difficulty is that Lenovo only supplies reflashing tools which work under Windows, and in order for these to work you need to boot Windows. Which is tricky, if your only OS available on disk is Linux, and you cannot boot anything else from USB.
Some affected users managed to attach CDROM via USB and proceed from there. Ideally Lenovo should provide BIOS reflashing tool which works under Linux :-(
I've used an Intel BIOS tool, some time in the past, that booted and updated the BIOS from a USB flash drive. So it was OS-independent.
THAT is the kind of BIOS tool that is needed - not something that REQUIRES WINDOWS to run. Have it boot it's OWN operating system or not even bother with an OS. Even DOS would work for this kind of thing.
"'ve used an Intel BIOS tool, some time in the past, that booted and updated the BIOS from a USB flash drive. So it was OS-independent.
THAT is the kind of BIOS tool that is needed - not something that REQUIRES WINDOWS to run. "
From what I have read on that Lenovo forum, they do offer this... but without the ability to boot from a USB device and with no internal optical drive, it doesn't do much good.
but without the ability to boot from a USB device and with no internal optical drive, it doesn't do much good.
BIOS updates like this do not boot from a USB device, they read the BIOS image to be flashed from a USB device, so being unable to boot from USB does not indicate that it will be unable to read an image file from a USB device.
"BIOS updates like this do not boot from a USB device, they read the BIOS image to be flashed from a USB device, so being unable to boot from USB does not indicate that it will be unable to read an image file from a USB device."
I don't actually have such a laptop in front of me now, but the impression I got from what I read is that the .iso image from Lenovo boots into some type of runtime environment and executes a flash utility, and that obviously will depend on the ability to boot from a flash drive.
What you describe sounds like an emergency recovery mode. If the BIOS image in NVRAM is defective and cannot be executed or if you hold a certain key when turning the PC on, it may look for a file of a certain name in the root directory of the first USB device found (you'd generally only have one installed then), but that may or may not be possible with the UEFI borked as it is.
From what I understand, the reason the device can't boot from USB is that the system can't update its device table; whatever was installed at the time the bad Ubuntu update ran is forever what it thinks is installed-- so if no USB devices were plugged in at that time, it will always think none are plugged in now. That will probably also affect the emergency recovery, it seems, if it doesn't think that the USB device exists.
None of the posts on the Lenovo forums about this glitch described the possibility of using a recovery mode as such. Several people went so far as to remove and replace the UEFI chip from the motherboard, which seems a little bit premature to me.
"Better still, one that doesn't require booting from anything that involves USB (because it's borked on affected machines - read the article!) - or CD or DVD (because that's no longer available on most new laptops.)"
If the afffected laptops support it, booting over the network is a possible way out.
Back in the day, floppies would have provided an alternative. Please let's not go back there.
Network booting is all well and good, but the newer systems also do away with network adapters. You only need wireless. And as someone who has got going too many systems via network boot, I have never got one going over wireless yet. EFI (everythings fu&^ed iinit?).
Wifi PXE? Hmm that would mean battling, shoddy PXE stack (surprisingly common), WPA issues or setting up an open network, cross your fingers and prey that the wifi drivers are adequate.
Back in the day, floppies would have provided an alternative. Please let's not go back there.
Why? There's really nothing wrong with trying to install Slackware from floppies, only to find that disc 15 of 18 is no longer readable and crashes the whole install..
(Just as well my friend had internet access at work and could re-download the floppy images.)
"Better still, one that can self-boot and doesn't need any installed OS."
I have a clutch of Lenovo Desktops and Laptops. For all of them, BIOS flashing tools are available as Windows software and as self-boot.Of course that's not much use if a rogue OS has prevented USB boot ...
If you did not know, built into all modern Intel-based platforms is a small, low-power computer subsystem called the Intel Management Engine (ME). It performs various tasks while the system is in sleep mode, during the boot process, and also when your system is running.
Architecturally, the ME varies from model to model, and over the past decade it has been growing in complexity. In general, it consists of of one or more processor cores, memory, system clock, internal bus, and reserved protected memory used as part of its own cryptography engine. It has its own operating system and suite of programs, and it has access to the main system's memory, as well as access to the network through the Intel Gigabit Ethernet Controller. If you had control over the ME, then it would be a powerful subsystem that could be used for security and administration of your device.
The ME firmware runs various proprietary programs created by Intel for the platform, including its infamous Active Management Technology (AMT), Intel's Boot Guard, and an audio and video Digital Restrictions Management system specifically for ultra-high definition media called "Intel Insider." While some of this technology is marketed to provide you with convenience and protection, what it requires from you, the user, is to give up control over your computer. This control benefits Intel, their business partners, and large media companies. Intel is effectively leasing-out to the third-parties the rights to control how, if, and when you can access certain data and software on your machine.
Leah Rowe of GNU Libreboot states that the "Intel Management Engine with its proprietary firmware has complete access to and control over the PC: it can power on or shut down the PC, read all open files, examine all running applications, track all keys pressed and mouse movements, and even capture or display images on the screen. And it has a network interface that is demonstrably insecure, which can allow an attacker on the network to inject rootkits that completely compromise the PC and can report to the attacker all activities performed on the PC. It is a threat to freedom, security, and privacy that can't be ignored."
At this time, developing free replacement firmware for the ME is basically impossible. The only entity capable of replacing the ME firmware is Intel and its OEM partners. And, since the ME is a control hub for your machine, you can no longer simply disable the ME like you could on earlier models, such as the Libreboot X200 laptop.
This means that if in the future we want more hardware that can achieve Respects Your Freedom certification, we will need to make it a "High-Priority" to support the work of those who are getting GNU Libreboot and 100% free system distributions running on other architectures, such as ARM, MIPS, and POWER8.
"Ideally Lenovo should provide BIOS reflashing tool which works under Linux :-("
Any why ?
A post further down states Ubuntu isn't supported by Lenovo, why should they expend resources? They support Windows and supply a BIOS flash tool for Windows - Sorted.
Physician Heal Thyself
you caught it, you fix it.
If it turns out X number of laptops ARE permanently borked, what happens then? Can anyone be sued, or is it teeth gnashing time?
Just like MS and every other software purveyor, the software is released "as-is" [...] [without] fitness for any particular purpose.
If your BIOS is already affected by this blunder, you may have to replace the firmware's flash memory chip – or the whole motherboard – if reseting the BIOS or this suggested workaround, or some other remedy, do not resolve the matter.
That is when you appreciate the Gigabyte dual Bios boards ... unless you switch bios and reboot into Artfully Awkward again, of course ... ;-)
> At least Windows never killed my BIOS...
Not your machine perhaps, but there have been plenty of reports of Windows also doing it in recent years.
That said, I don't think this is a really either Windows or Linux issue. I think the blame rests squarely with bloated UEFI design and in particular lazy implementations by many hardware manufacturers. It's plain that the design can't be very robust if software bugs can so easily upset the boot firmware.
"Not your machine perhaps, but there have been plenty of reports of Windows also doing it in recent years.
That said, I don't think this is a really either Windows or Linux issue. I think the blame rests squarely with bloated UEFI design and in particular lazy implementations by many hardware manufacturers. It's plain that the design can't be very robust if software bugs can so easily upset the boot firmware"
^ I agree with this and anything that is so fragile and so easily corrupted (irrespective of OS) is not fit for purpose. It's about time that the Unified EFI Forum started work on a competent, and above all robust, replacement for UEFI.
I never recall good ol' BIOS have this many and this severe issues.
Totally agree. Poor design. UEFI is a dreadful pile of dingo's kidneys. There's no way an OS should be farting about with BIOS code/settings and able to fubar your machine. Modern OSs are big and generally full of bugs and have no business messing about in the BIOS. I'm surprised this sort of fuckup doesn't happen more often. All mobos should have dual BIOSs to get you out of these situations and BIOS flashing/setting should only be possible from with the BIOS itself.
"That said, I don't think this is a really either Windows or Linux issue. I think the blame rests squarely with bloated UEFI design"
This isn't actually about UEFI at all. It's a level lower than that. This is a mechanism Intel designed to allow modification of the firmware from the OS *regardless what that firmware is* - it could be a UEFI firmware or some entirely different type of firmware. This SPI mechanism isn't part of the UEFI spec, nor (AIUI) can the implementer of the firmware *itself* really affect anything the SPI mechanism can do.
There's an explanation of SPI in the documentation for it in the kernel: https://github.com/torvalds/linux/blob/master/Documentation/mtd/intel-spi.txt
The funniest thing of all is this.
IBM "open sourced" the PC specs ( not the BIOS granted, that was Compaq ) and anyone can spec out a PC motherboard layout if they have the skills. Windows people banging about FOSS and open-source is shite are running their favourite, beloved O/S on open-source hardware!
> "What, just because Intel designed, wrote, and released the driver that's causing the problem? Never!"
There's a blackbox warning on the code. The fault ultimately lies with Canonical for taking something with that sort of warning and enabling it in their default configuration.
I'm sure if Windows had ever killed your BIOS, you would have been rushing to blame Intel or Lenovo .....
This has shades of an issue from several years ago, with some read-only optical drives taking liberties with the standards. They used the "write" instruction to initiate a firmware upgrade. Some Linux distributions used a hardware detection tool that attempted to determine whether an optical drive was read-only or write-capable by attempting a write operation. Nothing would actually be written to the disc, since the first block of data would contain a deliberate error. A writable drive should respond "OK, begin sending data", accept the data and then bomb out with a checksum error. A read-only drive should respond to the "write" instruction with "command not recognised" ..... Unless it was falsely interpreting the "write" instruction to mean "new firmware coming up" and responding "OK, begin sending data" ..... then overwriting the beginning of its own firmware with the test data .....
It could happen with any Operating System -- even Mac OS, since even Apple can't always control all their upstream suppliers. All it takes is for two people to interpret the wording of a standard differently, or one to ignore it completely .....
Somebody should have to take responsibility for this; but everybody has a good case for pointing the finger at somebody else, and it's the users who end up suffering.
"To be fair, Windows doesn't have bugs, Windows IS a bug."
"To be fair" - on these openly belligerent penguinista infested forums
that'll be a first
Be nice to see a post about an OS that isn't Linux not defaced by them (I appreciate this was about Linux but as ye sow so shall ye reap). Don't like something guys - here's a though - don't tell the rest of us because we don't care.
One of the nastiest Windoze virus infections - prevalent a few years ago - was called CIH. It would actually fry the BIOS on some machines, and render most machine unbootable by screwing up the BIOS settings. It wasn't (usually) detected by the usual "anti-virus" snake-oil, so it would infect plenty of other machines (mostly by sending spam emails) before triggering its BIOS-wrecking payload.
Remember - it's only M$-based machines that suffer mass virus infections!
> Why is this [a kernel driver for the SPI flash] even a thing?
Imagine that you wanted to write a Linux utility to reflash the BIOS. This would require some way for a user-mode program to access the BIOS flash. A kernel driver to do that is the obvious method.
See posts anove for why a Linux utility to reflash the BIOS is desirable...
> Why is this [a kernel driver for the SPI flash] even a thing?
Actually, I meant "why is SPI flash a thing."
At a minimum, SPI should be an option that is disabled by default. But preferably (IMHO) it shouldn't even exist. The BIOS' flash storage should be read-only outside of the BIOS' own configuration screens.
Otherwise some random software cock-up could brick your shiny new laptop (Q.E.D.)
At the upstream kernel level, I don't think it *is* enabled 'by default' (though exactly what 'by default' means can be a complex question when it comes to kernel compilation). It was Ubuntu's decision to build the module and include it in their kernel package. At least some other distros don't (e.g. Fedora, I just checked).
So, this sounds more like Intel's fault than Canonical's or did the whole point of that article go whooshing past the top of my head?
So, I read the article again. Was the driver released to Canonical prematurely or was it released to them as a test firmware version? I don't see where it says either.
From what I gather, this is a badly written Intel BIOS driver that's meant for OEMs to use and Canonical shouldn't have included in the kernel build. Which explains why only Ubuntu is affected (I literally just did a yum update on my Dell laptop before reading this, I gulped - but it's Fedora).
If blame is to be assigned, it's probably shared between Intel and Canonical.
"If blame is to be assigned, it's probably shared between Intel and Canonical."
Very possible, but it's also possible that Lenovo (or Insyde, the maker of the firmware) screwed up the UEFI implementation that made a driver that complies with the standards cause trouble anyway. Screwed-up UEFIs that don't meet the standards are the norm, not the exception, so this seems quite likely, given especially that the PCs that are getting screwed up are such a small slice of the total. It's only some of the many Lenovo laptops that have the issue, and those in turn are just a slice of the total PCs...
The driver is in the upstream kernel tree, which means it's up to distributions to decide whether or not to build it in their kernel package build, and if they decide to build it as a module, how to ship that module (in the main kernel package, in some sort of optional package, etc).
A large part of the job of distro kernel maintainers is keeping track of all the things like this that are in the kernel and deciding how to handle them.
Well, as far as new functionality (new drivers etc) in Linux kernel are concerned ... sometimes you do end up with beta-quality for some time after the release. I know it may sound like heresy to some, but there is a reason why RHEL is running old kernels (with a very long list of in-house maintained patches).
However, since I like living on the bleeding edge, I use fresh upstream kernels, which is how I also know that bugs are quickly fixed. Usually within days (or short weeks) from the first report.
As for malware writers - kernel module can do a lot and damaging hardware was possible since a very long time ago. But in order to run a new module, you need to root the OS first. So, nothing new really.
OMG welcome to the '90's when moving jumpers on the MB* was an absolutely almight b******g p-in-the-a!!**
But - agreed. Sometimes the "analog" way of doing things is simpler, quicker and more effective.
*and on disk drives and possibly other items which my memory has self-erased to maintain sanity.
**Remember back in those primitve times if you could not find the manual it was NOT available online and you spent all night trying different permutations of the jumpers. I am going to stop now before my hands start shaking.
> OMG welcome to the '90's when moving jumpers on the MB* was an absolutely almight b******g p-in-the-a!!**
You did not have to do it very often, and after you got the settings right, you could be sure no mere software could reach out with its clammy fingers to move the jumpers!
"You did not have to do it very often, and after you got the settings right, you could be sure no mere software could reach out with its clammy fingers to move the jumpers!"
And then hardware manufacturers figured out that you could silkscreen the meaning of the jumpers on the PCB right next to them, and it was heavenly.
"...could not find the manual it was NOT available online and you spent all night trying different permutations of the jumpers."
Yep, especially when you needed to drop in a new SCSI drive on a chain that might 5 devices already! ha ha! I was so glad when they started printing the jumper pin-outs on the drive lable stickers, that must have saved hours for people like us!
Course, to search online you have to know what the card is. I had to give up on a Vesa local bus I/O card after losing the documentation. Couldn't find the manual online, no markings on the card (with PCI there will always be a certification ID that identifies the hardware, plus the PCI ID itself), gave up and bought a new (better) one. (This is for retro gaming, VLB is commonly needed for a lot of 486 systems, and I already had the hardware from decades back).
Course eventually I found a photocopied sheet of the jumpers lying around the house, months later.
"Unfortunately, the write protect switch on an SD card doesn't connect to the circuitry inside it. It's just something the card slot detects. So the slot has the option to override it."
Yup. In fact, I have custom firmware for my Canon digital camera that uses the SD card write protect "switch" as a toggle. Unlocked = normal firmware, Locked = custom firmware (which then overrides the lock to allow writing.)
Or in the case of the DIP switches on dot-matrix printers, spent hours changing them and printing the full character set and some lines of text ending with various combinations of [LF], [CR] and [CR] [LF] only to find that none of them seemed to make a blind bit of difference ..... took me a good while to twig that the machine only bothers to read the DIP switches when it's first switched on, and then proceeds studiously to ignore the living daylights out of them .....
In fairness to the manufacturer of the printer concerned, this fact might have been mentioned in the manual which I did not have, which was why I was pulling this crazy stunt in the first place .....
When I worked at Mitsubishi / Apricot I had no idea they were clairvoyant...
Certain motherboards had dual BIOS flash devices, physically enabled by a DIP switch. The flashing process involved 'backing up' the current code to the secondary device. Corrupted BIOS - no problem, just flick the switch...
Apricot had quite a few features back in the mid 90s (particularly regarding device security) which the rest of the world still hasn't quite caught up with or are at least rare.
Strangely, there is almost nothing on the Internet about any of it with the happy exception of what seems to be most of their manuals and software ( http://insight.actapricot.org/insight/default.htm )
If you clicked through the "I accept the license" section on any software, you missed the disclaimer, guarantee and possible remedies. The disclaimer includes something along the lines of "even if not fit for the purpose for which it was sold". At best, the guarantee promises to "work broadly in line with the [non-existent] printed instructions" and the remedy is limited to replacement of the physical media on which the software was supplied (presumably the language is still there now that software is often downloaded from an app store). Consequential damages are always excluded.
The joke is that free software comes with an automatic money back guaranty. When was the last time you got service like that from a commercial software reseller?
In the UK at least, you can't sign away negligence, and paying for it is irrelevant. I wouldn't want to be in Ubuntu's shoes in court.
However, the fact that it's not a part of the default configuration and that it warns you explicitly not to do it is a defence however, which seems to be what has happened.
"However, the fact that it's not a part of the default configuration and that it warns you explicitly not to do it is a defence however, which seems to be what has happened."
That's written in reference to *Ubuntu*, not the user, AIUI. It was Ubuntu's choice to build the module and ship it in their kernel package. Once the module is built and included, it will be automatically loaded on hardware it supports. The user would have to explicitly remove or blacklist the module to prevent this, and there's no reason why any typical Ubuntu user would have done that.
I thought Ubuntu was based on Debian Testing and then given some grinding and polishing by Canonical. Given that Testing is only about six months into the usual two year development cycle then Sid might be a close approximation. This is why I wait for the Stable/LTS releases.
This is why I use Linux Mint. They base it on an LTS kernel/distro from Canonical. And, if you change kernel version, their upgrade won't change the kernel version you're using. And you have a good program to control updates for anything on your system.
Any OS, you can treat it like Windows and blindly accept all updates, and you would probably get results of this sort from time to time.
Because Mint, + Mate + TraditionalOK Theme works really well, on new or old hardware and can be used by windows users familiar with Win9x /NT4 to Windows 8 without any difficulty. Possibly even working with more legacy applications than 64 bit Win10. Though a pity MS killed Skype 4.3 lately, the replacement isn't on Software Manager and is poorer on Win Desktop & Linux than 4.3
"If you know what Ubuntu is based on [Debian Sid], there's really no surprise..."
It's difficult to know what you intended to achieve with such a fatuous remark, but anyway, Debian Sid is a rolling release that's been around since Debian 1.1 'Buzz' in 1996. It is in a constant state of change so saying that something is based upon 'Sid' without also saying when your particular snapshot was taken just means something from between 1996 and the present, which is meaningless and explains nothing.
Surely building a computer where software can perma-screw it up is the problem?
I mean, we _could_ blame Canonical for not testing it (or possibly using code not ready for release), or Intel for writing that code in the first place, but I can't help thinking that having the ability to re-write firmware *WITHOUT* any method to restore said firmware back to factory default/known-good-state is... well... shit.
I think if anyone should be sued, it should be Lenovo (and any other affected manufacturers), and they in turn may sue Intel because it's probably Intels fault. Some how.
Lenovo has no culpability. They do not sell laptops that support Ubuntu installs. They do not make that claim. Anyone who buys a Lenovo laptop to use with Ubuntu is removing the OEM installed OS and installing something else.
If you want Ubuntu laptops there is System 76.
Anyone who buys a Lenovo laptop to use with Ubuntu is removing the OEM installed OS and installing something else.
If you want Ubuntu laptops there is System 76.
You try telling that to some users, some live in parts of the world where only big brand names are available (apparently) and if it isn't that, then it's that they [Linux OEM] machines are more expensive or underpowered compared to a much larger company with better economies of scale.
After buying another machine and replacing Windows, next time 'round, they still complain there aren't good choice in companies selling linux pre-installed.
"Canonical works closely with Lenovo to certify Ubuntu on a range of their hardware."
That clearly says it is Canonical's responsibility and not Lenovo's, thus Wayne is right that Lenovo have no culpability. Also currently, Lenovo UK list no laptop that can be purchased with Linux pre-installed.
"That clearly says it is Canonical's responsibility and not Lenovo's, thus Wayne is right that Lenovo have no culpability"
If it's Lenovo's crappy implementation of the UEFI standard that triggers this issue, then they certainly should be liable for it.
Canonical merely included a driver that Intel wrote. One would hope that testing by either Intel or Canonical would reveal the issue, but not even Intel has the resources to test every model of laptop out there in the world. At some point, it has to come down to whether Lenovo and Insyde flubbed the UEFI standard to make an otherwise decent driver into a disaster, or whether the issue was inside the driver itself. Intel will almost certainly release a driver version that works around this issue, but that in itself doesn't mean the problem was theirs from the start. Drivers contain shims and workarounds for other vendors' problems all the time.
"If it's Lenovo's crappy implementation of the UEFI standard that triggers this issue, then they certainly should be liable for it."
AIUI this has nothing to do with UEFI or any particular implementation of UEFI. This is about SPI, which is an Intel mechanism for messing around with the firmware on Intel hardware, regardless of what that firmware actually *is* (it doesn't matter who wrote the firmware, or what the contents of the firmware are, or what firmware standards the firmware does or does not implement). IMBW, but that's what I got from reading the docs.
Intel created this SPI thing, and Intel wrote the kernel driver for it. Canonical then chose to build and ship that driver in Ubuntu; at least some other distros did not choose to do that (I haven't checked any except Fedora, which does *not* include it).
> UK keyboard
In the UK, there is Entroware who sell Linux, especially, Ubuntu-based, laptops. Herself has one, and is very happy with it. I'll probably go for one when my old Thinkpad, currently held together with epoxy putty, gives out. I think they are quite a small outfit, but al went well with the purchase of my wife's machine.
"In the UK, there is Entroware who sell Linux, especially, Ubuntu-based, laptops."
Thanks for the signposting.
What does herself think of the keyboard? Does it bend when typing?
I, too, will be thinking about a newer lighter machine in the new year.
Icon: toss up between the pint in thanks and the coat in regard to bare legs in December.
"Nimbusoft seem to be all sold out, but I'm looking at Entroware."
Worth looking at PCSpecialists too. OS free so you'd need to install yourself - I've got a 4 year old i7 which installed OpenSUSE without a prob. ( I'm typing on it now). They don't support Linux officially but if you select 'No OS' they'll ask if you want Linux and point you to their Linux forum.
@Baldricck In answer to your question, Care to get them to supply me with a UK keyboard layout? I usually use this sort of thing: https://www.amazon.co.uk/English-Transparent-BLACK-Stickers-Letters/dp/B015HL95ZM/ref=sr_1_1?ie=UTF8&qid=1513955608&sr=8-1&keywords=laptop+keyboard+stickers+uk
"I think if anyone should be sued, it should be Lenovo (and any other affected manufacturers), and they in turn may sue Intel because it's probably Intels fault. Some how."
FFS - If this was Windows you'd be blaming Microsoft alllllll the way down the line - there would have been way Intel would catch even the slightest hint of blame in these forums.
Actually no. No I wouldn't blame Microsoft.
I would, and have, assigned blame where it belongs. I have whined when blame has been assigned wrongly. NO MATTER THE OS.
It's just that usually when there is a problem with a Linux install, the blame is assigned to Linux by ignorant people. At which point I try to explain to them that the blame belongs elsewhere. But they do not learn/understand. Sometimes ignorance can not be cured.
I see your point that hardware that can be killed by software is something you'd intuitively expect not to happen, but unfortunately this has been happening for a very, very long time.
CPUs have moved beyond the short period of time where they couldn't cope with incompetent installation (old AMD CPUs with no heatsink=bang). GPUs require active management and have for years. The firmware appears to be better these days in terms of power management and throttling, but there's a lot of hardware that has to be tweaked in a very specific way in order to work well and at the highest performance.
It just isn't safe to blindly poke ports to 'see what happens'
I have had several Dell branded laptops unexpectedly become totally unresponsive (apart from the power button lighting up) while installing the Windows 10 Creators or Windows 10 Creators Fall updates, right after the new drivers were downloaded from Microsoft.
A couple were able to be salvaged by leaving the system battery unplugged and the CMOS battery unplugged for an extended period of time (a bit hard when the batteries are deep inside the units), others never recovered at all.
So it's not just a Linux thing, I'm pretty sure the same buggy drivers were present in Windows 10, but got removed quickly and without any public mention.
" I'm pretty sure the same buggy drivers were present in Windows 10, but got removed quickly and without any public mention"
Considering the size of the Windows user-base and the act that ElReg would have jumped all over it, I doubt that very much.
Intel Drivers for Linux are just that, they are dedicated...
"A couple were able to be salvaged by leaving the system battery unplugged and the CMOS battery unplugged for an extended period of time (a bit hard when the batteries are deep inside the units)"
At least one poster on one of the Lenovo forums said that his CMOS battery was soldered to the motherboard. More planned obsolescence.... that REALLY angers me.
Before anyone asks when was the last time I had to replace such a battery with hardware that wasn't so obsolete that it was useless anyway: I'm using a laptop right now to write this whose internal battery died over a year ago after years of service. Fortunately, it was a very common coin-cell battery (same as all my desktop motherboards... 2032?) in the usual type of holder. Once I got the top cover/palmrest off, replacing the battery was easy. It would have been even better if the battery holder on the motherboard was accessible from the hardware access cover on the bottom of the unit like the CPU, GPU, and RAM (the wifi card and HDD have their own separate cover, and the laptop main battery is externally removable), but nothing's perfect.
>I have had several Dell branded laptops unexpectedly become totally unresponsive (apart from the power button lighting up) while installing the Windows 10 Creators or Windows 10 Creators Fall updates
Agree there are problems which seem to be associated with UEFI.
WRT Windows 10 Creators Fall update, what surprised me with a bunch of HP laptops, was that the one a user updated, purely using the MS WUP service was bricked [Issue was reported in the HP forums at the time and got some press exposure.]. However, on the others by firstly doing an HP maintenance update (using the preinstalled HP tool) and doing as instructed, namely reinstall the BIOS that had been installed some months previously - doing a full overwrite/reset, the subsequent MS updates worked without problem.
My hypothesis was that an update (via MS?) did something to the UEFI memory that was causing the Fall update problems, by performing a full BIOS refresh, I was setting the BIOS and the relevant UEFI memory area to a known state.
"Seemingly, a problem with this code causes the OS to flip the wrong configuration bit in a hardware register, and write protect the firmware's data, triggering further failures."
So is this a write-only write-protect bit? Once flipped, a few lines of C code can't simply unflip it?!
If that is the case, I could swallow a few grams of silicon and *puke up* a better design.
OK, not fully an amateur since I've been supporting computer users as part of my job for decades, but I'm not a comp sci trained pro. And my main role was always to think like a pro should, but understand like a user would. An in neither role can I understand these aspects.
1.) How does an untested driver ever get put in a considered release of anything. Is no one in charge of the project?
2.) How can an OS (or any other significant s/w) update get into release without it having been tried on a good range of commonly sold machines?
3.) How can any piece of software be written and distributed to normal end-users that is allowed to make non-reversible changes to a layer fundamentally lower, be it s/w changing the OS or OS changing a BIOS.
4.) How can a piece of software (for general release) even be written such that it makes non-reversible changes to a setting, i.e. have not have an option to undo anything it can do, even if that has to be in a special mode.
1) Because testing is impractical. In the commercial world, drivers get updated for a limited amount of time to require people to replace the hardware every 2-5 years. In the open source world, drivers can easily live a decade. Linux support for outdated hardware is outstanding, but utterly impractical to test in house because that would require a warehouse full of kit and someone to walk around to see if it is behaving as required during tests.
2) A good range of commonly sold machines more difficult than you expect. OEMs frequently change the bill of materials without changing the product name. Do you really want to buy a new machine of each type from each manufacturer every month and run lspci and lsusb to see if there has been a change? Rest assured the OEM will not tell you about changes and may not be able to get you a specific configuration on request. How do you expect to fund this massive regular hardware purchase from free software?
3&4) Gross professional incompetence springs to mind but management diverting resources from incomplete software to put out some other fire is very common.
Long ago, the boot sequence was area where the competent thought before they bought, and made sure they had access to an unbricking tools before doing anything interesting. These days, the competent are so vastly outnumbered by the clueless that UEFI exists without catastrophic loss of sales to the OEMS inflicting it on us.
I can speak to one of your questions with some authority, this one:
"2.) How can an OS (or any other significant s/w) update get into release without it having been tried on a good range of commonly sold machines?"
as I'm more or less in charge of QA for an OS (Fedora).
So, there's two key points to make in answering the question. One, what exactly is "a good range of commonly sold machines"?
The number of bits of PC hardware out there is probably literally uncountable. It's difficult to overstate just *how much goddamn hardware* is out there. Even if you say "well, Lenovo laptops seems like a reasonable subset to cover, how many of those can there be?", the answer is *still* tens of thousands, counting all the variants on all the base models they've sold in all geos over the years. So even just this part is...incredibly difficult. Especially for a relatively small OS, like Ubuntu or Fedora, which releases rapidly (we both release every six months). It is, practically speaking, an inescapable truth as a small QA team for a rapidly-releasing OS that there will be quite popular hardware you don't test on. There's just too damn much PC hardware out there.
I dunno the specifics for Ubuntu, but for Fedora for e.g., we have about 10 full-time paid QA staff, plus community volunteers. At a very rough guess, I'd say each Fedora release has probably been run on a few thousand different systems before it's released. But after it's released, it probably gets run on something like 100x or 1000x more hardware than it was tested on. There's just no realistic way to 'solve' this. It's always going to be a problem unless you make like Apple and say "we are only going to support a very small number of systems that we engineered ourselves".
Bigger, slower-moving OSes like Windows and RHEL can do *better* here, but they still can't be anywhere close to perfect.
Secondly, there's the specific failure mode here: what happens, AIUI, is that the system firmware effectively gets placed in a read-only mode, notably meaning EFI variables like the boot order can't be changed. But that's not a particularly *obvious* failure mode. If your test is "install the OS, check it works", then your test would *pass* on an affected system. You're only going to catch this failure if your test is "install the OS, check it works, then try and change the system's firmware settings in some way".
Which sounds like a reasonable test, and sure, it *is*. But there are thousands of potential 'reasonable tests' you could perform on any given bit of hardware - work in QA for six months and you personally will have a list far longer than you'll ever have time to run. And again, an OS releasing on a six month cycle with a QA team that doesn't number in the hundreds of thousands *absolutely does not have the ability* to perform all those 'reasonable tests' on all the hardware the OS might get used on. Or even a reasonable subset of the tests, on a reasonable subset of the hardware.
Frankly, trying to QA an operating system at all is like trying to bottle the ocean in an eye-dropper. Trying to QA one that releases every six months with a small team is more or less an exercise in futility. I have an awful lot of sympathy for my counterparts at Ubuntu on this one. We (OSes) have all had fuckups like this before - Ubuntu, other Linux distros, Windows. I remember the kernel code that could brick some CD/DVD drives, thanks to a bad interaction between some perfectly reasonable code in the driver and an appalling choice made by the firmware implementation for those drives, for instance. To be entirely frank, I'm really only astonished this stuff doesn't happen *more often*. (I have a theory that the longer you work in QA the more surprised you are that anything works at all...)
"Bigger, slower-moving OSes like Windows and RHEL can do *better* here, but they still can't be anywhere close to perfect."
Windows isn't slower moving anymore. They release every six months too now, only they have no experience with having a rapid release schedule, and they try to force consumers to install the update as soon as possible rather than having any pretense of allowing users to make their own decisions. Microsoft already demonstrated that soft-bricking a customer's PC to further their own agenda is not out of bounds, so the only difference here is that at least Canonical feels bad about it. It would seem to me to just be dumb luck that prevented MS from wrecking hardware in this fashion, if that is indeed what happened.
Then perhaps you should start taking the Apple route, at least partially, and say only such and such hardware can be certain to work and the rest are just CE-YOYO. If you can't test everything, say so and say so VERY CLEARLY so they either get it or get classed as Darwin Award candidates.
"Then perhaps you should start taking the Apple route, at least partially, and say only such and such hardware can be certain to work and the rest are just CE-YOYO. If you can't test everything, say so and say so VERY CLEARLY so they either get it or get classed as Darwin Award candidates."
Well, this sort of is what happens, at some levels: mainly the levels where people pay distributors money :) The major enterprise distros like RHEL and SLES do have hardware certification programs, and those certainly are backed by testing: if a RHEL or SLES release is certified on some piece of hardware, that isn't just an empty sticker, it means RH or SUSE really did do some pretty extensive testing on that hardware (and of course there are very real, contract-based consequences if stuff goes wrong very badly on it).
At the level of a distro like Fedora, though, things aren't really that clear. There's in fact an idea to take a modest step in this direction; there's a group of folks who would like to nominate some particular laptop model (or series) and commit to doing much more extensive testing on that specific model, then publicize this effort and say "if you want a better chance of everything working really well, buy this laptop". But I don't think (personal opinion, here) distros like Fedora can ultimately go too far down this path. It's just not what people seem to really _want_ from a typical community Linux distro; it seems like people really do want us to ship them all the bits relatively quickly, do whatever level of testing we can fit in, and then run it on absolutely any old bit of hardware they have lying around. You've got to adjust your approach to what it is people actually seem to want of you. And of course it's not practical for a community Linux distro to go *full* Apple and actually build and ship hardware, or even tie up very tightly with a hardware manufacturer, so you're always going to be somewhat at the mercy of a party that doesn't have much interest in you, trying this approach; what if you go all-in supporting some specific hardware model line, and then the manufacturer discontinues it?
I - again, personal opinion - generally agree with you in terms of being clear about stuff like this...which is why I write comments like this. I generally want to level with people, including when we just flat out screw stuff up, or when a task is just too big to be realistically possible. On a personal level, I hope if you check what I've written around the net before, I'm pretty open about things like this.
Two things: 17.10 is a release, but a test release; users who want to feel safe should install an Ubuntu LTS version like 16.04. Furthermore, a RaspberryPi and a cheap cable/chip clamp can be cobbled together to form an in-system flash writing tool; I guess if you want to live on the bleeding edge with your Linux distribution, you might be expected to resort to such extreme measures once in a blue moon ;-)
(typed from 17.10. On a MacBook)
I always run Fedora and thought that by having it free I WAS one of the testers. It doesn't stop me from whinging when something fails but it does lead to believe that this is tough luck.
Many Linux distros are test beds for the commercial versions.
My Fedora 27, that worked perfectly, now no longer plays videos in VLC on the normal Gnome desktop. I have to log into Gnome Orig to do that, but in this desktop Netflix drops frames and is unwatchable. Even in the working desktop VLC fails and crashes if I use Chrome browser. At present I am switching between 2 different desktops to do different things. First time in about 20 years of Linux use that I've had such a pain in the bum problem.
It just reminded me that I am the tester and that for 20 years I've been very lucky that serious bugs havent effected me before.
Fedora has gotten increasingly buggy / hard to install / hard to get updates for several months now.
It has gotten where I keep thinking they are going to make some awesome new release, but really it is that they are focused on commercial customers- aka "paying" customers.
That's a good thing- that Linux continues to grow- but yeah, sometimes if you're not a paying customer with Red Hat then Fedora is close cousin.
"Fedora has gotten increasingly buggy / hard to install / hard to get updates for several months now.
It has gotten where I keep thinking they are going to make some awesome new release, but really it is that they are focused on commercial customers- aka "paying" customers."
I'm sorry if you've been having trouble, but I can tell you your diagnosis of the cause is not accurate. There hasn't been any significant reduction in the resources RH devotes to Fedora for the last "several months" (or in fact, years); the number of folks working on Fedora is actually growing modestly. It's a fairly small team, but it's certainly not worse than it used to be.
So far as "hard to get updates" goes, there *has* been a change here which may be what you're seeing: we have been implementing a policy called 'batched updates', which is basically aimed at reducing the frequency with which non-critical updates appear, so you're not constantly being prompted to install updates. In practical terms what's changed is that the 'default' flow for package maintainers is now that when you think an update is ready to go, you "submit it to batched", which means it goes into a sort of queue; once a week (I think it's once a week) all the updates in the "batched" queue get sent out together. Previously, as a package maintainer, when you thought an update was ready to go out, you would "submit it to stable", meaning it'd be sent out with the very next update refresh (these usually happen once a day).
Maintainers *can* still submit updates directly to stable, and security updates and other critical fixes are still sent out this way. But we're sort of 'nudged' to submit to batched by default in the tooling.
Beyond that, if you like, please do feel free to drop me a line (awilliam at the domain you'd expect for a Red Hat employee...) about whatever bugs you're running into, and I'll see if I can help out with any of them. Sorry again for any trouble you're having.
Am I getting to old for this or has nobody got the slightest idea what stability is these days ?
Don't kid yourself that LTS is stable either. I have experienced several incidences of Ubuntu LTS breaking the kernel/initrd for the next boot or failing to load suitable SATA drivers into the init rd and breaking the next boot.
It also fails to keep a copy of the last known good kernel and init rd as any change in initramfs tools or a kernel module or boot loader gubbins makes it rebuild the lot.
The BIOS settings are stored in on-board flash memory, and that is accessed serially over the Serial Peripheral Bus (SPI) interface. This is a simple communications bus that uses 4 wires, and is commonly used to interconnect peripherals and devices in computers and latterly IoT systems.
The point is, when the kernel module writes back to the SPI flash, it writes corrupted data under certain circumstances. With no built-in CD-ROM, or floppy disk (remember those?), if USB booting has been wiped out you can't boot any OS, or run any tool, to correct it.
The answer:- have an unwritable flash memory with a small recovery program. Except that, when you make the chip, you have to write to it in the first place.
"The point is, when the kernel module writes back to the SPI flash, it writes corrupted data under certain circumstances."
The device's firmware clear function should wipe that and restore it to a factory-fresh condition, should it not? If it's just the stored data and not the firmware code itself that became corrupted, why is this even an issue?
"The answer:- have an unwritable flash memory with a small recovery program. Except that, when you make the chip, you have to write to it in the first place."
Some laptops already have this function. Just yesterday I read a review of an HP Elitebook 840 G1 that has this very feature. So does my main PC's motherboard (though, sadly, my laptop's does not). It should be the industry standard in all PCs, IMO, especially since the risk clearly isn't just "what if something happens during a firmware update attempt." It is well known that flashing firmware is a risk, and if it's happening as a matter of course during everyday operations of a UEFI PC, it is that much more important that the system be fail-safe.
Unfortunately, the trend is clearly in the other direction, with the whole industry attempting to engineer more planned obsolescence into their machines than ever before. Why would Lenovo want to put in a UEFI fail-safe that almost none of their potential buyers understand well enough to make it a selling feature? They want their PCs to fail (after enough time has passed to make the people think they've gotten their money's worth out of it) so people buy new ones! The people who keep an internal tally of such things and resolve to avoid vendors guilty of such things are very much in the minority.
Why would Lenovo want to put in a UEFI fail-safe that almost none of their potential buyers understand well enough to make it a selling feature?
They just have to put something into the marketing material to say that their new PCs are resistant to problems with firmware corruption with their NEW [SUPER FEATURE NAME]
That should be enough,
Yes, and they better start selling for a hundred bucks a pop or users will simply drop off the net at that point. Much like home Internet or cellular data costs. Where I live the price of all of these keep rising while wages remain stagnant. I make well above the mean/median/average income in my area and find it a stretch to layout for a new laptop. I pay for a couple of cell phones but no data, here it's too expensive.
tinfoil hat time :)
Makes me wonder if it's not all part of a bigger plan to destroy our ability to access reliable information. Murdock, Disney and the Koch's are buying up Media. So, make access to other media too expensive and voilà we know what those groups want to tell us which is worse than nothing.
hahaha, oh how I go off topic and rant :)
Depends if the kernel includes the intel-spi driver (either builtin or as a module). The config item is called 'SPI_INTEL_SPI_PCI', you can check the kernel config for that.
Ubuntu built it as a module and shipped it in their kernel package. Fedora does not enable it on any branch at present. I dunno about other distros.
Another reason BIOS should only be stored in read-only ROM chips.
Most BIOSes go through multiple revisions over the life of the machine -- sometimes to fix bugs or security holes, but often also to enable using larger memory modules or newer processors. I don't really want to have to dig out a screwdriver and have them mail me a new chip every time I need an update. This would also cripple interesting projects like coreboot.
This is not a new issue, incidentally -- the "Kickstart" BIOS for the original Amiga 1000 was so half-baked, they put it on a floppy and had the machine load it on boot. Later machines got it burned into ROM, though.
In many devices it is impossible or extremely impractical to open them and then many would not have a jumper or dip switch to change and that is NOT a design flaw. It is purposefully done to save pennies, maybe a couple of dollars, per device. I think that will be a continuing trend, although not one I like. I like to get into the innards on occasion and as previously stated by others it, certainly, helps with the planned obsolescence if you cannot. As for the ME issue, I have no doubt that the big manufacturers are in deep the three letter agencies. I, personally, haven't heard much about that since the story(ies?) on here.
It's also due to a demand by the enterpside high-ups to be able to administer to machines no matter how far away they are without having to get someone to actually go out there to physically work on it (which costs both time and potentially lots of money, plus there's a trust issue). They want it, they expect it, they paid for it, they'll either get it or find someone else to do it for them.
Reading between the lines a bit it looks like: (1) The problem affects devices with particular BIOS implementations; (2) The problem affects particular hardware implementations only; (3) The problem affects particular Linux kernel releases that include particular versions of the Intel SPI driver; (4) The problem affects particular Linux distributions that perform operations that use the SPI driver in a way that upsets a BIOS that doesn't like a particular set of hardware design decisions. There's also a distinct possibility that individual choices of setting within the BIOS affects one or another of these layers of conditionality. There are also other reports of other non-Linux OSs that happen to perform similar operations over the SPI bus having the same problem on the same hardware platforms.
Canonical seem to be behaving responsibly and reacting quickly once they were identified as a possible cause even if they are not the only contributor to the total chain of "gotchas" that lead to end-user problems. I'm not so sure the same could be said of other players.
How much regression testing can we expect, and from whom?
Lenovo can't be expected to test every single OS, especially those that haven't been written yet.
Intel can't be expected to test every single hardware platform.
Canonical can't be expected to test every single hardware platform either.
The Linux community probably tests the widest variety of hardware platforms, but only by trying it and having occasional problems (like this one!).
You can expect BIOS implementations to test correct operation on correctly built hardware.
You can expect hardware designers to use reliable BIOS suppliers.
You should be able to expect hardware designers to build hardware that correctly connects up the chips they use.
But even that testing won't be 100% in practice, even though it should be.
In my opinion, if a machine won't allow the BIOS settings to be corrected, or it it allows the BIOS settings to be set to an invalid state, the machine builder is responsible even if only for the choice of BIOS supplier they made. They have no responsibility for preserving the correct operation of a non-supported OS, but do have a responsibility for ensuring it is possible to re-install a supported OS.
(And before anyone says so I don't think measures that allow a device to be deliberately "bricked" if stolen should be circumventable at all easily but do think it should be difficult to activate such facilities to make it very unlikely to activate them by accident.)
"It's very serious, since our machines do not have a CDROM"
I am guessing perhaps the term CD-ROM has become a generic term for optical disc drive, since i think its probably been quite a while since you could buy new machines with CD-ROM drives since I don't think the manufacturers have made them for several years. So if your laptop has a optical drive it is more than likely DVD-ROM or DVD-RW drive than a CD-ROM.
Oh Grub. The bootloader that under Ubuntu at least comes split across several packages to increase the chances of it screwing up.
Do you think it asked me before it decided to change the display mode to something the monitor attached to the server could not handle ? This may have been partly Ubuntu's fault. Again 12 or 14 LTS.
Turns out the bug was initially found in kernel 4.11 , in June 2017
Also, the bug does not actually "brick" the computer. Looking at the fix it appears the problem is in module initialisation code. This section of code gets hit on every start, and it borks BIOS anew on every start. However following the original kernel thread it appears that as soon as the module is updated not to flip "writeable" flag in bios on startup, the machine is back to normal.
Well, I'm not sure that's the full story. It sounds something like the affected hardware has an issue where once the flag is flipped to non-writeable, *it can't be flipped back*. I'm not 100% sure on this, but that's what it sounds like from the description.
So these two issues combine in a rather...unfortunate way on the affected systems.
I must admit I have downvoted your opinions on many occasions but in this case you are spot on. Somehow after a few decades dealing with technology one gets an instinctive "feel" for things that are good ideas and things that are a botched mess from day one; from the beginning UEFI seemed to me to be the latter. Daily experience of it soon confirmed that feeling; it's just a clumsy, sprawling mess.
Well you can rationalize that "feel".
Essentially the biggest problem in IT is complexity. Complexity means high costs when you develop something, high costs for maintain it, as well as lots of bugs, both security critical or not. Therefore it is wise to avoid it.
On the other hand, changes can mean new features. A wise person would weigh those features against the cost of complexity and choose accordingly. Dumb persons ignore the cost of complexity.
UEFI is a typical example where it does add magnitudes of complexity, while essentially doing exactly the same as the old BIOS. There is very little on the "feature" side, but a lot on the complexity side.
UNIX has lots of features which are the opposite. Features like piping are of only minor complexity cost, but allow the functionality to explode. Just add a tiny bit of code to your software, code you often need anyway, and it can be easily combined with other programs.
"trying for things like built-in OS-agnostic drivers"
That's a feature in "Open Firmware" which typically was around 20-30 kilobytes of Forth code. It's not hard to do that. BTW even the old BIOS provided that in some limited form. It was good enough to have a GUI installer with access to the graphics mode of graphics cards.
On my son's machine some time early on in its life.
But there was an auto-recovery system whereby, using a magic system of key presses at startup, the system auto-copied a backup copy of the BIOS back into the booting BIOS.
Booted fine afterward.
HP Envy touchsmart 15 something from about 3 years ago, BIOS name might be Insyde, that wht system info says but don't wanna reboot to check.
I recommend Mageia.
It came from Mandriva, which came from Mandrake.
Mageia is polished and nifty, but it also needs lots of help e.g. testers, designers. Its roadmap often gets delayed due to a lack of manpower.
I did consider it, however Ubuntu, along with its bastard child Mint, is one of the dominant distros, so most instructions and packages on the interwebs for Linux software are written for Ubuntu (and quite often Fedora). Ubuntu also holds your hand a lot. Need samba file sharing? Easy. I have not had such an experience with other distros such as Fedora.
Mint is a good alternative if you like more Windows like desktop environments and safer, but less cutting edge, update cycles.
Even if this was 100% Canonical's fault (or the blame could be 100% attributed to Linux and/or open source/free software) and I bricked an expensive laptop, I would still use Linux and would still fully support open source.
I'm pretty sure someone who knows how to use a search engine could find a huge number of reports of Windows effectively bricking hardware - at least to the point where "expert intervention" is required. I can see in my "your topics" list a link to a thread on an article about printers being knocked out, just for a start.
To start, let me preface by saying I have worked in QA roles for the last 30 years for all of the above mentioned companies in some capacity, and still do today.
The reasons for SPI read/write access is that newer systems Don't have battery backed CMOS (it is used for the RTC). Instead, bios settings are stored in a mapped section of the firmware device (now known as the Intel/integrated Firmware Interface or IFWI), and contains uEFI bios, Management Engine (ME), firmware settings, boot order, secure boot keys, etc. Firmware updates will now update the ME along with FRU/SRU data (hardware specific settings for the ME), along with the bios. In some systems, the bios update is stored in a temporary location and flashed in by the ME on reboot.
As to the OS needing access to this, it really depends on the manufacturer's implementation. Windows has separate drivers for every brand of every device that has to be separately loaded to configure the OS. Linux uses a more generic approach, with some device configurations determined by the driver when it loads (hence the aforementioned CD firmware issue).
So, who to blame? Sadly, it really ends up being the vast majority of consumers, by driving demands for newer more powerful systems, and manufacturer's trying to meet the needs of the many. As also stated, no one manufacturer can fully test every possible scenario. When I worked in Intel's Processor Validation (1996-2004), one of our general managers told the press that even with the hundreds of thousands of test hours run on pre silicon and post silicon across a vast mix of system configurations, it is nearly impossible to retest more than ~20% of all possible combinations, and that was just the cpu. Ubuntu has it much harder, as they are tasked with trying to release a combination oftested code bases (think of all of the different programs that make up a distribution release) on a vast combination of old and new hardware, with different architectures, vendors, combinations, etc. The best any of us QA folks can do is rely on the wider picture of testing that goes on globally by other users, manufacturer's, and even competitors.
I sell Lenovo laptops running Linux that is customized for companies like Home Depot and hospitals.
They work great with Linux.
We customized out own Linux distro named Beaux O/S and it looks just like Mac OSX except for it runs on Lenovo machines (and all Dell on the really high end machines).
Since we originally started out with an Ubuntu derivative I can tell you that it runs on Lenovo laptops- we use the older ones.
Biting the hand that feeds IT © 1998–2019