* Posts by Nate Amsden

2437 publicly visible posts • joined 19 Jun 2007

VMware teases replacement for so-insecure-it-was-retired P2V migration tool

Nate Amsden

Re: Im surprised they are bothering with this...

Just curious what are the specs of those 30 physical servers? 30 physical servers could literally host thousands of VMs, so the cost really isn't that bad at all for Enterprise+, and at least at the moment the cost of Enterprise+ hasn't changed in at least the past 10 years(~$5k/socket, doesn't take into account inflation even). Although of course now they are limiting the license to 32 cores, so if you have a 64 core CPU then you of course need 2 licenses. But still that's a damn good value I think.

Now of course if you are running small systems, then there is less value. But if you're running at least 30+ cores and 300GB+ memory (I have had this config for going on 8 years now originally with dual Opteron 6176(24 cores)->6276(32 cores), newer systems will be 64 cores and 768GB memory at least) it's a good value.

Now if you add in the other shit, beyond the basic hypervisor that's where I lose interest. vRealize, NSX, and the ever massively increasing list of addons(checked VMware's site and was overwhelmed by the number of products they have that I don't care about) that I have no interest in(and so don't know the cost). I do remember at one point pricing vRealize because my senior director was interested (I was not), for our servers at the time it was going to be $250k I think(don't recall the # of servers at that point in time, it was less than 30 though). I said I'd rather buy another half dozen VMware hosts(at ~$30k+/pop with licensing) then get that product. His main want was something that could predict future capacity needs, and he heard that product could do that (I don't know if it can/could but I wouldn't trust it or anything else that could predict that regardless given our custom application stack which in my experience such stacks can change capacity requirements in an instant with a new version of code, so you really need solid performance testing not some magical tool that will extrapolate past performance and predict future, because the app is ever changing).

LogicMonitor is by far my favorite tool for vSphere monitoring, I even have it able to report real time vCPU:pCPU ratios, and CPU Mhz for everything(otherwise not available out of the box), and tons of cool custom dynamic dashboards(and it's super easy to use).

Obviously the hypervisor market has matured a lot in the time since but your comment reminds me of a situation I was in at a company back in 2008. We were a very basic VMware customer, no vmotion, no vCenter just the most basic licensing, back when you had to buy licenses in pairs because VMware didn't support single CPU systems (and their licensing didn't really take into account multi cores). ANYWAY, my director at the time hated paying for VMware (we had licenses for maybe half a dozen 2 socket systems it really wasn't much). We were a CentOS shop mostly, and some Fedora as well at the time. He wanted to use Xen because it was free. He hated the VMware tax. I disagreed, and we got into this mini argument on the floor (open floor plan office). I'll never forget this because it was just so weird. He said something along the lines of he didn't think I wanted to run Xen because I was a pussy. (used that word exactly). I didn't know how to respond(and don't recall how I did). But anyway I left the company a few months later I think(it was on it's way to going out of business anyway). Right after I left he directed the rest of my team to ditch VMware and get on Xen. Ok so they did, well they tried. After a month of trying they gave up and went back to VMware. They had an issue with Xen(on CentOS) and running 32-bit CentOS VMs. It didn't work. Don't recall the problem they had but no matter what they tried it didn't work. We leveraged some 32-bit systems at the time just for lower memory usage. I suppose they could of ditched all 32-bit and gone everything 64-bit but for whatever reason they didn't and instead dropped their Xen project and went back to VMware.

I didn't like that director for a long time after, we got into another big argument over Oracle latch contention(which I was proven right again in that situation as well). But we made up over email several years later. He apologized to me, and we are friends now(though not really in close contact).

But the hypervisor is core, it's the most important bit, has to be good quality, stable, fast, etc. I think VMware still owns that pretty well. Granted if you stay on the bleeding edge(vSphere 7 was a shitshow I heard), you may have issues, I don't stay on the bleeding edge(still ESXi 6.5 in production baby, re-installing to 7.0U3 soon though, not going to risk an "upgrade"). also support is shit that is true, though for me it doesn't matter too much my configuration is quite conservative as a result I have hardly ever needed support over the past decade. Really blows my mind how well it works.

Been using VMware since 1999 when it was a linux-only desktop product.

Twitter datacenter melted down in Labor Day heat

Nate Amsden

Atlanta

As a QTS Atlanta customer I heard on the grapevine years ago(perhaps as much as 8+ years ago) that Twitter has/had a large presence in QTS Atlanta, which is the facility I have used for a decade. If they are lucky they are still in that facility. It's spit up into many sections I never saw their equipment.

Fantastic facility and staff. I don't think there's much chance of it ever going down. It's relatively new too, there was still a lot of construction going on when I first got there, I recall walking through a huge open area with tons of construction equipment to get to the data center, and it was several years later that they built out the official front entrance area in another part of the building. And then on top of that they added another 250k sq foot data center on the same site? So I think ~750k sq feet raised floor between the two. Just massive.

Am assuming the Twitter Sacramento facility was perhaps dedicated to them, since I haven't noticed(not that I have looked hard) of reports or articles indicating an outage affecting other companies in that area(I live 1 hour from Sacramento).

Retbleed slugs VM performance by up to 70 percent in kernel 5.19

Nate Amsden

I was surprised at the hit

Not for this but for the original Spectre/Meltdown performance hits on Linux. For both work and personal I have been actively avoiding the fixes for years whether it be firmware updates or microcode updates(and using "spectre_v2=off nopti" as kernel boot options), and forcing my ESXi hosts to use an older microcode package. I have always felt these are super over hyped scenarios especially if you are running your own infrastructure(I have confidence in what I do managing internet facing infrastructure for the past 25 years). Of course I can't avoid the updates forever, especially since newer systems come with newer firmware at a minimum. Really wish/hope there can be UEFI/BIOS settings to disable these fixes in firmware while maintaining any other fixes that would be useful.

Anyway, for personal use I was given a couple of Lenovo T450s laptops(I think from 2016 time frame) a few months ago and I put Linux Mint 20 on them. They both had 12G of ram and SSDs, but both felt so slow they were practically unusable for me(I didn't use them for anything yet other than just installing Linux). For a while I was suspecting the SSDs, which are Intel, things like software installs were taking a long time, and reviews of these particular Intel SSDs said they were quite slow compared to the competition. So I was ready to buy new SSDs when I realized I hadn't put those "spectre_v2=off nopti" settings in the kernel yet.

I put them in and it was amazing the improvement in performance. Didn't need new SSDs after all. From what I've read the performance overhead for these fixes can vary quite a bit depending on what the CPU is doing, so it's not likely a generic CPU benchmark would register the full effects(or maybe new CPU benchmarks can/do I am not sure).

I really couldn't believe the improvement in performance for just basic desktop tasks it blows my mind. I'm unsure of a good way(in Linux, even though been using Linux on the desktop since 1998) to accurately measure such performance differences(using a stopwatch manually isn't sufficient). I recall back in the 90s on the Windows side I played around a lot with the Ziff Davis benchmarks which were widely used, one of them at least was for desktop apps.

VMware has clouded the SmartNIC market, not created it

Nate Amsden

$20,000 server

My vmware servers were ~$30,000 (with enterprise+ licensing for 2 sockets) back in 2011, newer servers kept with about that same price point up until the last big round of purchases maybe 5 years ago(~44 real cores 384GB memory with 4x10G and 2x8G FC). I'd assume a good vmware server today with plenty of cores would be at least $50,000 today with the new license fees etc (64+ real cores, 768GB of memory).

In fact I just priced out a "PowerEdge R7525 Rack Server" (normally I do HP but Dell's site is easier to quote with), and the cost without VMware licensing came to $41,557 then add in $10k for vSphere and you're above $50k. Wow that was a better guess than I expected.

And that doesn't even take into account NSX licensing, which I assume the customer would require if they wanted to use DPUs. Have never priced that but believe it's a non trivial expense.

Even now it seems on VMware's site a simple enterprise+ license with 1 year of production support is $5k/socket still(about what it was 10 years ago?).

DPUs sound like a more natural fit for vSAN (for dedupe and compression etc) than networking, it seems the overhead of storage for HCI is pretty huge. Though it seems VMware hasn't figured out how to do that yet. vSAN customers are(I am not one) certainly much more plentiful than customers driving so much network traffic that they need/want DPUs.

Me, I'm just now getting around to upgrading(very carefully, especially with firmware and driver version matching) to vSphere 7, hoping most of the bugs are worked out, ESXi 6.5 has been rock solid for years not a single issue. I see nothing in vSphere 8 that looks remotely interesting to me personally. DPUs sound cool on paper, it will probably take a while for them to get the bugs out.

Intel finally takes the hint on software optimization

Nate Amsden

Re: Lack of vision since 1999

losing out? x86 has killed almost every traditional RISC server/workstation processor out there regardless of how optimized the software was. Alpha/MIPS/SPARC/PA-RISC all dead or walking dead. Power(PowerPC?) not far behind. Itanium dead too of course. MIPS still has customers in the embedded space I'm sure, but their glory days back on SGI big iron etc are of course long gone.

Even modern ARM multi core server CPUs show they can only get good performance if they too are consuming 100-200+ watts of power per socket.

I think the argument of x86 being inefficient died about 15 years ago, about when we started seeing quad core processors(also we were well into 64-bit x86 at that point). Also I think even x86 has been mostly RISC with a translation layer since 686 days or something? At the end of the day RISC or not RISC it doesn't matter, it's an obsolete argument.

Don't blame Intel for the current state of x86, if you hate x86 so much you should hate AMD, if it wasn't for AMD64 instruction sets x86 would probably be buried by now as Intel wanted to kill it, but they were "forced" to adopt AMD64 and go from there. We were fortunate that happened, Itanium seemed to be a far worse solution.

(I don't have any issue with x86 myself)

VMware offers cloudy upgrade lifeline to legacy vCenter users

Nate Amsden

Re: Dream on.

yeah I can't imagine there being any demand for this sort of ability. If something is that old typically you leave it alone until you are ready to completely replace it. vCenter is almost always a VM, and really the only reason to be running an old vCenter is because you have old ESXi hosts to manage. So even less point of moving such a vCenter to another location away from the hosts.

Oh I didn't notice they say you can shift old ESXi to cloud. That seems kind of pointless too as the hardware requirements are what they are, really old hardware. Would be messy to try to run it on something much newer. ESX 6.0 for example is certified to run on HPE Gen7 systems, the first of which I got 11 years ago(was running ESX 4.1 originally then ESXi 5.5 then 6.0).

pointless

Patch Tuesday: Yet another Microsoft RCE bug under active exploit

Nate Amsden

PPP

I had to check the CVEs to confirm that Point to Point is actually PPP, and the CVEs state that Windows RAS is vulnerable. I personally haven't seen a Windows RAS in probably 20 years now. I know some folks use Windows server to terminate VPN connections on, am unsure what protocols those use. I'm guessing 99.9% of windows installs out there aren't running the RAS service.

But even back in early 2002 the small company I was at was using Cisco 3000 VPN Concentrator appliances for Windows users(had no mac users), and probably a poor security solution but a open source product called vpnd I think at the time for the limited Linux users who wanted remote access(maybe half dozen including me).

Too little, too late: Intel's legacy is eroding

Nate Amsden

Intel can certainly afford this

They've sort of been here before at least related to AMD(early Opteron era etc), don't forget the years spent pitching Itanium and that funky RDRAM(unrelated to Itanium). Then Intel turned things around(around Xeon 5500 I think?) and not long after AMD made some huge mistakes and they went downhill(not long after Opteron 6000).

Now Intel has done some huge mistakes and AMD has turned things around. Though I think even under the best conditions AMD lacks the fab capacity(yes I know they don't have their own fabs anymore) to consume a large part of the market, they just can't make enough chips to do it. Not that they can't make a bunch of money doing what they are doing already.

Intel has tons of money, and tons of resources and huge market and mind share. Seems a big mistake to dream they are out for good, just another cycle.

Fortinet's latest hyperscale kit packs 2.4Tbit/sec of firewall into a 4U chassis

Nate Amsden

impressive but scary

Pretty amazing specs on paper at least. Though the idea of having so much traffic being routed through a complex next generation firewall as a single point of failure (referring to software failure not hardware) is scary. I've read (from Fortinet fans) that Fortinet has a history of questionable firmware versions that can cause big problems(so find a good version and stick to it is the suggestion). They aren't alone here for sure, Cisco has a really bad reputation for Firepower. Sonicwall has a pretty terrible reputation among network folks as well. I'm sure there are others too. I personally have used Sonicwall for the past decade without much issue but all my firewalls are basically layer 4. I assume most of the pain with Sonicwall may be the layer 7 stuff. I recall one stupid mistake on Sonicwall's part earlier this year I think where they pushed a bad signature update out to their Gen7 firewalls and made them go into a crash reboot loop. One of my office edge firewalls was hit by that, what was even more strange to me is that firewall had no layer 7 licensing, so why the hell was it bothering to download a signature update that it didn't have a license to use. Stupid.

Load balancers have a solid history of being able to do Layer 7 well at high speeds, but they too are far less complex than a next generation firewall.

Point being, firewall at layer 4 is pretty well flushed out at this point the systems are simple and reliable probably 98-99% of the time. Layer 7 firewalls and deep packet inspection, SSL inspection reliability seems to be far less (and such reliability hasn't seem to have improved much in recent years as complexity grows ever greater). Having so much complexity at a single point for massive traffic just scares me(probably anything over say 50Gbps).

I'm less concerned about something getting through the firewall (as in firewall not detecting a threat, since no way any firewall can block everything so some stuff will get through regardless) than I am the firewall outright crashing, dropping packets for unknown reasons or otherwise blocking valid traffic because of bug(s).

Maybe I'm wrong though.

Linux may soon lose support for the DECnet protocol

Nate Amsden

nostalgic about VIM

seems like a strange statement to say about a tool that is still used by many every day. I've never personally had experience with VMS or OS/390, and only a tiny bit of AIX at a Unix software development company 20 years ago. But vim I use daily as do many many others..

As for tape, there is more capacity of tape being sold pretty much every year. I pushed for tape at my org a couple of years ago for the IT team to backup with Veeam. Offline backups(not in the tape drive) can't be hit by ransomware or any other online attack.

VMware patches critical 'make me admin' auth bypass bug, plus nine other flaws

Nate Amsden

worried for a second

Always a bit worried when I see a vmware security thing hitting the news here. Then most of the time it turns out it's not vCenter/ESXi and I feel relieved since those are the only two products that matter to me. (well workstation too but level of risk there is tiny)

Why the end of Optane is bad news for all IT

Nate Amsden

as cheap?

"Optane kit is as big and as cheap as disk drives."

Seems to be way off. A quick search indicates 128GB in 2019 was about $695 and 512GB was $7000.

If Optane was as cheap as drives it would of sold a lot more and Intel wouldn't be killing it. Augmenting a few hundred GB in a system obviously won't revolutionize storage in thr way the article implies. If the cost was cheap then all the storage could be replaced and moved to the "new" model of accessing storage.

There is a path to replace TCP in the datacenter

Nate Amsden

What interesting datacenter applications?

The article concludes with the person who wrote this new thing claiming their new protocol will allow interesting data center applications to use networking better. What sorts of applications? How much better and in what scenarios/speeds?

Please don't tell me these applications have anything to do with Web3, or blockchain BS.

Things seem to be working just fine with TCP, and in cases where that may be too much overhead we have UDP too.

One area where TCP may not be so well is tons of tiny connections(I used to manage an app that would sustain about 3000 HTTP requests/sec and it overflowed the # of ports on the linux systems until we got the load balancer to pipeline the requests which dramatically lowered overhead and improved performance), but in that case the solution is to keep a pool of connections open and send requests over them rather than open/close in rapid succession.

Nate Amsden

Re: ???

I don't think TCP/IP blurs the OSI model. TCP/IP is two different layers you are referring to them as one. TCP (layer 4) and IP (layer 3).

applications usually care about layer 7 only.

Upgrading what might be the world's oldest running Linux install

Nate Amsden

mailman upgrade

There doesn't appear to be a way to upgrade mailman, as far as I could tell there was no real migration path from mailman 2(python 2) to mailman 3(python 3). Mailman 3 looked to be an insanely complex beast(compared to 2). I posted a few times to the mailman mailing list looking for advice and saw others in far more serious situations than my personal mail server that sends maybe 50-100 mailman messages a year. One person said they had been down for weeks trying to get the new mailman working with big mailing lists that were offline, and lots of big ugly errors trying to get mailman 3 working.

Mailman 3 looked entirely too complex and not ready for prime time in my opinion(exception being perhaps super large lists with dedicated staff to manage it). So my solution for the meantime was to build a dedicated VM with an older Devuan release with mailman 2 and just have that do the mailman processing. My main system can remain whatever version I want and then I just have postfix route the mailing lists to that dedicated system for processing. Works fine.

I feel I have a good knack(?) in being able to quickly determine if something is more complex than it needs to be and so I usually try to avoid such solutions where possible.

My oldest personal system (upgraded from Debian to Devuan since) looks to date back to what looks to be about 2010, about when I switched to ESXi for my personal public servers. I think all of my servers have work have been rebuilt at least once over the years(first systems fired up early 2012). Running Ubuntu at work and when 20.04 came out I radically restructured how the VMs were configured and rebuilt everything that could run on 20.04(some things still stuck on 16.04).

Was running Debian since 1998 with the release of 2.0. Switched to Devuan in the past couple of years.

I estimate I spent over 100 hours of work dealing with systemd related issues at work going from Ubuntu 12.04 to 16.04 several years ago but for the most part got it all figured out there.

Your job was probably outsourced for exactly the reason you suspected

Nate Amsden

9:1 for me

A company I worked at a long time ago had an outsourced to HCL in India tier 1 ops team. My first and only experience working in an organization that had something like that. At one point the company thought HCL was too expensive so decided to build their own ops team over there(tier 1 only again). They hired away some of the HCL people I think and a manager to build the office/team. I recall one of their new candidates actually never showed up for the position, another candidate was caught doing stuff with malware on the network. Eventually they built a team that was somewhat stable from what I recall.

Fast forward a year or so and I leave for unrelated reasons. Fast forward another year or two and my former manager said(to a mutual friend) they hired 9 people to fill my role (I assume all overseas??). Apparently 9 wasn't enough(since they still struggled). Company went under a couple years after that(went under because they slowly lost all their customers except for their largest, then their largest cut ties as part of a larger change in strategy and they were done then).

(I haven't been tier 1 ops except for a brief stay at a .com back in the year 2000)

Tesla jettisons 75% of Bitcoin holdings, boosting cash balance by $936m

Nate Amsden

Elon loved the bay area lock downs..

I'm sure he's thrilled with China's zero covid policy and the lock downs there.

Google, Oracle clouds still affected by UK heatwave

Nate Amsden

Most server components are designed to operate up to 40C / 104F have been for a very long time(10-15+ years). Some components can go well beyond that(and some others can't run at 40C).

I think Amazon did a test back probably before 2008 running I think a rack of HP hardware literally outside under a tree or something for some period of time just to see how it handled the temperatures/humidity and even dust etc, and from what I recall it worked fine.

Microsoft had an incredibly innovative data center pod design(IT PAC) many years ago (I don't know if they ever used it at any sort of scale)

https://www.youtube.com/watch?v=S3jd3qrhh8U

Many hyper scalers at least at one point(probably still do in some cases) like(d) to run their stuff at 30C+ / 90F to reduce their cooling costs since the hardware can handle it, but of course gives less margin of error when there is an issue. Certainly would suck to work in such a facility.

Nate Amsden

doesn't make sense

Article says

"Hard disks are seldom rated to run at more than 50 degrees Celsius, and as the mercury topped 40 degrees in London yesterday it’s not hard to imagine that temperatures became so hot that mechanical disks in a densely packed device faced an extreme environment and suffered more than solid state components."

It's very hard to imagine in fact, it's not as if these systems are running under a big tree outside there is air moving in the facility, and at the very least air moving in the chassis.

Add to that most modern spinning drives (at least the ones I have from Western Digital which are "Enterprise" SATA) are rated for 60C / 140F AMBIENT. My own personal drives I have run as hot as 91.5F ambient and the drive temps ranged from 107-120F. Not sure how hot the drives would be running 50F hotter ambient than that.

Add to that such systems must have thermal shutdown features to protect the systems from damage regardless.

Seems like many failures here.

Judge approves Twitter's request to hurry along Musk trial to October

Nate Amsden

seems so simple

Elon is asking for more time for due diligence(find spam info), yet he apparently signed a merger agreement waiving that right months ago. Seems like they could just decide on the spot.

Google, Oracle cloud servers wilt in UK heatwave, take down websites

Nate Amsden

Re: cooling failure?

If you have N+1 cooling, and you're driving your cooling pretty hard bringing up the 3rd cooling unit(the redundant one) should dramatically reduce the load on the other two units (for example). Less chance of failure. But if they can't handle a cooling failure then that says bad facility design to me regardless of outside temps.

Certainly such a design can be intentional and accepted by the customer as a risk for a lower cost setup. The problem is most lower tier customers have no idea such compromises are made and are caught off guard by the outages(so in the end it's a customer education issue, but one that the "cloud" players do a lot to downplay in their operations).

Nate Amsden

Re: cooling failure?

Having extra cooling should certainly help. It's not as if there was wide spread reported data center outages across UK or even Europe. Even if they were not reported such outages would show up in services going down across the board and that was not the case at all.

So this seems to be a fairly isolated incident likely a single facility that has both google and Oracle in a co-location.

Nate Amsden

Re: cooling failure?

yes in cloud terms using a different zone is supposed to be a different data center even if it is near by(which goes to my comment as to plan for facility failure). Though history has shown in several cases that cloud issues can(certainly not always) impact multiple zones in a region.

Nate Amsden

cooling failure?

That's why there is backup cooling system right? Oh, wait maybe not there because Google and Oracle(and other big IaaS players) cut corners on their systems(they do this intentionally to reduce their costs). In this case perhaps Google and Oracle share a common co-location or something. (obviously not all co-locations are created equal, most are pretty crappy but the biggest players generally have good setups at least in situations where they built themselves rather than acquired some smaller player's assets).

Just keep that in mind for folks that think these cloud providers choose top tier setups for their data centers (some techies(surprisingly few IMO) will already know this is not the case and if you use these providers you should plan for facility failure).

This situation reminds me of one of my earliest jobs 22 years ago, I had a 10 rack server room with a 2 ton HVAC and also a 4 or 6 ton HVAC. I lived about 2-3 miles from the office. I had what I thought was a great power setup lots of big UPSs lots of runtime(60+ minutes with battery expansion packs). I went through great pains to setup a combination of APC PowerChute for Unix/Windows as well as Network UPS Tools(NUT). I was so proud. It ran great.

One Sunday morning(still in bed) I get a text on my phone saying UPSs are switching to battery. I was happy everything was working the way I intended. Until about 30 seconds later and I realized the cooling system had no power. So I rushed to the office to perform manual shutdowns on stuff. Nothing lost, nothing damaged. But I learned a good lesson that day. That was my only position where I had an on site server room I was responsible for everything since has been co-location.

Windows Network File System flaw results in arbitrary code execution as SYSTEM

Nate Amsden

Windows NFS was so terrible

Maybe it's better now but am not holding my breath. Several years ago I tried to deploy a pair of "appliances" from HPE that ran Windows 2012 Storage server specifically for NFSv3 for Linux clients (and maybe 1% SMB). I chose this because it was supported by HPE, it could leverage existing back end SAN for storage, and it was highly available. Add to that I only needed maybe at most 3TB of space so buying a full blown enterprise NAS was super overkill as nobody made such small systems. SMB NAS units(Synology etc) I didn't deem acceptable levels of high availability.

I figured my use case is super simple(file storage, not transactional storage), my data set is very small, my I/O workload is pretty tiny, it's fully supported so how hard could it be for Windows to fill this role?

Anyway, aside from the 9 hour phone support call I had to sit on while HPE support tried to get through their own quickstart guide (I was expecting a ~10 minute self setup process) due to bugs or whatever(they ended up doing manual registry entries to get the two nodes to see each other which were connected directly via ethernet cable), the system had endless problems from a software perspective. 99% of the software was Microsoft, the only HPE stuff was just some basic utilities which for the most part I don't think I even used outside of the initial setup.

I filed so many support issues, at one point HPE got Microsoft to make a custom patch for me for one of the issues. I never deployed the patch purely out of fear that I was the only one in the world to have the issue in question(that and it wasn't my most pressing concern of all the problems I was having). I should of returned the systems right away but I was confident I could work through the problems given time. I had no idea what I was in for when I made that decision. I was also in periodic contact with HPE back end engineering for this product though their resources were limited as the software was all Microsoft.

The systems never made it past testing, maybe 6 months of usage with many problems and support cases and workarounds and blah blah.. I designed the file system layout similar to our past NAS offerings from Nexenta(VM based), and FreeNAS (also VM based). There were 5 different drive letters, one for production, one for staging, one for nonproduction, one for backups(with dedupe enabled) and one for admin/operations data. The idea is if there is a problem on one of those it doesn't impact the others.

The nail in the coffin for this solution for me, was at one point the backups volume (which had dedupe enabled) gave an error saying it was out of space(when it had plenty of space according to the drive letter and according to the SAN). When this happened the entire cluster went down(including the other 4 drive letters!). The system tried to fail over but the same condition existed on the other node. I had expected that just that one drive letter/volume to fail, that is fine let the others continue to operate. But that's not what really happened, all data went offline. I worked around the issue by expanding the volume on the SAN and it cured it for a while, until it happened again, all volumes down because one got full. WTF.

I tried to figure out a way to configure the cluster so that it would continue to operate the other 4 drive letters while this one was down. Could not figure it out(and didn't want to/couldn't wait hours for support to try to figure it out). So I nuked the cluster. Went single node, from that point on if that drive letter failed(if I recall right anyway this was years ago) the other drives remained online. I assume the source of the "out of space" error was some bug or integration issue with thin provisioning on the SAN, but was never able to figure it out. I do recall getting a similar error from our FreeNAS from the same array, there was plenty of space but FreeNAS said there was not(filesystem was 50% full). This issue ONLY appeared on volumes with dedupe enabled (Windows side dedupe, and FreeNAS side dedupe). I've never seen that error before or since. It was an old array so maybe the array's fault. I don't mind that particular volume going offline (it only stored backups), but the final straw was that volume being offline should not of taken the rest of the system down(in the case of Windows cluster) with it.

But I had so many other annoying issues with NFS on Windows, I'm a Linux person so I didn't have high expectations for awesome NFS from Windows but again my use case was really light and trivial(biggest requirement was high availability).

In the end I migrated off of that Windows NFS back to FreeNAS again(had replication but no HA, so I never applied any OS updates to the system while it was in use, fortunately no failures either), before later migrating to Isilon. Makes me feel strange having 8U of Isilon equipment for ~12TB of written data(probably be closer to 7TB after Isilon overhead, more data than original because years have passed and we have since grown). But I was unable(at the time) to find a viable supportable HA NAS head unit offering to leverage shared back end storage. I was planning on going NetApp V-series before I realized the cost of Isilon was a lot less than I expected(and much less than V-Series for me at the time especially if you consider NetApp licensing for both NFS and SMB).

(When I first started out we used Nexenta's VM based system, and they officially supported high availability running on VMs. Initially this worked fine, however in production it was a disaster as the Nexenta systems went split brain on several occasions corrupting data in ZFS(support's only response was "restore from backups"), things were fine after destroying the cluster and going single node but of course no HA anymore)

Smart thermostat swarms are straining the US grid

Nate Amsden

fully manual

I learned the central AC in my apartment should only be run fully manually. Allowing the thermostat to keep temperature increases the chances the AC will fail. A few years ago I had a lot of problems with it and maintenance couldn't find the cause(they replaced some parts to no avail, I gave up trying to get it fixed). The AC compressor would fail to kick in(not an expert just going by the lack of noise), and instead ice would build up inside. It would then take many hours(sometimes waited 24h) for the ice to melt before the AC would work again. I'd estimate there is a 5% chance of this failure happening on any given startup. Turning it on manually every time I can tell when this failure happens(if the compressor doesn't kick on within 20 seconds), then I turn it off, wait a bit and try again. When the temp gets to a level I'm comfortable with(I have many temperature sensors in different places) I turn the AC off again. I never use the heat function. I think this year I have caught it fail to start 3-5 times.

But to these smart thermostats, I suspect it's likely if there were no defaults the customers would set them similar to what they are set to now. It's clear the $$ savings isn't worth it to them, otherwise they'd change it themselves. Some power providers have incentive based electricity plans where the price can vary depending on time of day or whatever but I think most people just have the more basic plan that charges based on overall usage (probably in tiers) for the billing period.

Older AMD, Intel chips vulnerable to data-leaking 'Retbleed' Spectre variant

Nate Amsden

worry more about the fixes than the problems

The risk involved with any of these side channel attacks are so tiny for 98% of the systems out there. I suppose the one place where one might need to be more concerned is if you are a service provider with multiple customers on the same systems. Otherwise if you have control of your workloads there really isn't much to worry about, there's far bigger threats out there than side channel and will be forever, and there will always be some new side channel attack about to be discovered because security folks want to be famous regardless of how limited in scope the issue is. Meanwhile the fixes for these problems cause their own problems whether it's performance or stability issues.

I would like it if there was a simple bios setting to disable these side channel fixes so you could install new microcode for OTHER fixes but keep the side channel stuff disabled. I run all my linux systems with "spectre_v2=off nopti" kernel settings(which may or may not be enough), and most of my systems are quite old at this point(Xeon E5-2699 v4 are my newest) and I have intentionally not updated firmware in many cases to avoid these fixes. Have read too many horror stories about them. I also have gone the extra mile (so far anyway) to exclude microcode updates from vSphere 6.5 (yes still running that) updates.

It's nice to have the fixes for people who are super paranoid and really want them, but also nice to have easy to use options for folks to opt out of them if they desire.

Broadcom takeover deal for VMware faces no rival bids

Nate Amsden

Re: Requiem for a once-great

Would expect this to be a standard practice. Though hard for me to think of a reason for them not to approve it, Broadcom isn't a competitor to VMware in any way that I'm aware of. Situation could be worse for sure. (I'd prefer it if VMware could remain independent too).

Lots of doom and gloom from many customers I'm sort of neutral on it myself being a customer since 1999. The only products I care about really are ESXi, vCenter and to a lesser extent workstation. Don't care about NSX, vSAN, vRealize, Tanzu, VDI or any of their other stuff. Hell I haven't even thought about upgrading past vCenter 6.7 / ESXi 6.5 yet(in the past I have gone 6-9 months past EOL before upgrading). So Broadcom "shifting focus" to what VMware's top customers want seems like a good thing (though I would assume VMware was doing that already). All my VMware purchases over the last decade have been through HPE, I have opened a half dozen support cases in those 10 years, and usually the support wasn't good, so I'm happy I have a setup that is simple & reliable with a super conservative configuration.

Moving to subscriptions and jacking up the cost doesn't sound great(perhaps they backtrack on that some, remember the vRAM tax?), but they'll still have to remain competitive. I saw some folks on reddit freaking out about their community VMUG offering(I don't think anythings been changed there yet), another thing I have never used from VMware (the free esxi license has been fine for my personal use, and modern vCenter is far too bloated for my personal use memory wise). I use workstation daily, though am still on version 15 (have a license for 16 but don't need it at this time). I have been hosting my own personal web/DNS/email/etc on top of vSphere for going on maybe 12+ years now, and before that I was using VMware GSX(aka VMware Server). Been using ESX professionally since 3.5, and was using GSX in mission critical roles prior to ESX going back to 2004.

I keep saying(haha) last product that VMware released that I was super excited about was vSphere 4.1. Everything since has just been meh, supports newer hardware that is nice, few things here and there most of which I don't care about. I do miss the .NET thick vmware client though(I say that as someone who has run Linux on their desktop since 1998), the HTML client doesn't hold a candle to it(my delay in upgrading means exposure to the Flash client was very minimal, that was intentional on my part, though I still need the flash client to rebuild vCenter HA(rare) as HTML client gives a stupid error). Also miss the thick ESX (vs ESXi).

Old-school editor Vim hits version 9 with faster scripting language

Nate Amsden

never knew

Been using vi since early 90s not sure when I switched to vim assuming ~20 years ago for the most part. Certainly not an expert with it though I use it daily(am mostly comfortable with regular vim too). Never knew it had a scripting language. Unsure as to the purpose, I browsed the linked man page for vim 9 which talks about the scripting language but no indication as to what the purpose of it is. Searching for the text "why" on the manual page indicates perhaps the scripting language may be used for vim plugins? (am unsure if I use any, don't know what might be a plugin vs built in I don't use anything but the defaults). Curious if anyone else can clarify, is it just for plugins or something else too.

On that note want to mention how much I hate the newer vim mouse interfaces, first encountered for me on Ubuntu 20 (which has vim 8), not sure if it is just new to vim 8 or perhaps older, but to disable that crap I always have to put ":set mouse-=a" in my ~/.vimrc (which is the first time I can recall ever using ~/.vimrc). Those new things were driving me insane before I figured out how to turn them off.

Running DOS on 64-bit Windows and Linux: Just because you can

Nate Amsden

support was removed by AMD

I believe I recall(perhaps incorrectly) that the AMD64(?) instruction set disabled 16-bit support when operating in 64-bit mode(if that makes sense). So for example if you wanted to run 16-bit natively you could but your OS would have to be 32-bit. Perhaps it had something to do with making the system with less registers or something?

Just doing a quick web search turns up one comment(#13) from 2004 which claims this, though I'm confident I recall seeing more official source(s) over the years:

https://forums.anandtech.com/threads/amd64-and-16-bit-compatibility.1292075/

Since Intel licensed the instructions from AMD (I think I recall that?) then the same would be true for 16-bit code running on 64-bit Intel x86-64 chips.

Adobe apologizes for repeated outages of its Creative Cloud video collaboration service

Nate Amsden

Good example

Just because you are hosted in cloud doesn't mean you can leverage it if your app sucks(which it seems so many people don't understand, and probably never will). I can't count the number of times over the past 20 years of managing web applications where the performance limits were in the app and adding more servers/cpu/whatever wasn't going to do anything. Add to that most places don't properly performance test(I've actually only seen one valid performance test in 20 years and that was a very unique situation that really can't be replicated with another type of application). I have seen countless ATTEMPTS at performance testing all of which fell far from reality.

The org I work for did tons of performance tests(I wasn't involved in any of them, but my co-worker and manager were) before we launched our app stack in public cloud in late 2011, only to have all of those numbers tossed out within weeks and the knobs turned to 11 because the tests did not do a good enough job at simulating production workloads and cloud costs skyrocketed as a result. Of course moving out of public cloud months later (early 2012) helped a huge amount, and every day since just better performance, latency and availability across the board, and saved $10-15M in the process (over the past decade) for a small org.

I'll always remember a quote from a QA director probably 17 years ago he had a whole room of server equipment they used to do performance tests on for that company (the only company I've worked at that had dedicated hardware for performance testing), his words were "if I had to sign off on performance for any given release we wouldn't release anything". That company there was a handful of occasions immediately following a massive software update we had to literally double the server capacity in production for the same level of traffic vs the day before. I ordered a lot(for us anyway) HP DL360s back then shipped overnight to get the capacity in place on many occasions.

Another company I was at(the one with the good performance test), had the fastest running app I've ever seen, over 3,000 requests per second per 1U server sustained, made possible by no external dependencies, everything the app needed was on local disk (app ran in Tomcat). One particular release we started noticing brownouts in our facilities from hitting traffic limits that we should not of been hitting. We hadn't run a performance test in a while and when we did we saw app throughput dropped by 30% vs an earlier release. Developer investigation determined that new code introduced new required serialization stuff in the code which reduced the performance, they suspected they could get back some of that decrease but far from all of it.

Then there's the DB contention issues, Oracle latch contention at a couple different jobs, and massive MySQL row lock times (60-120+ seconds at times) at other places due to bad app design. Another quote I'll forever remember from that company 17 years ago during a massive Oracle outage due to latch contention "Guys, is there anything I can buy to make this problem go away?" (I wasn't responsible for the DBs, but the people that were told him no).

OVHcloud datacenter fire last year possibly due to water leak

Nate Amsden

Re: Ironic

There are no smart ones here. The smart ones would never of been a customer of OVH to begin with. Only reason I can see to use a provider like OVH is because you really, really don't care about just about anything (other than perhaps cost).

The big IaaS clouds are really not much better though. They too design for facility failure and expect the customer to account for that(as we have seen many customers do not account for that, or if they do they do a poor job of it). A lot of people still believe that big names like Amazon and Microsoft have super redundancy built into their stuff they of course do not, because that costs $$$, they rather shift that cost onto customers.

Meanwhile in my ~20 years of using co-location I have witnessed one facility failure(power outage due to poor maintenance), and we moved out of that facility shortly after(company was hosted there before I started in 2006). That facility suffered a fire about 2-3 years later. Customers had plenty of warnings (3+ power failures in prior years to the fire) to leave. There are a TON of facilities I'd never host critical stuff in(probably 60-75% of them), even the facility I use to host my own personal gear in the Bay Area (which has had more power outages than my apartment over the past 7 years but for my personal stuff given the cost it's not a huge deal).

My favorite facility at the moment is QTS Metro in Atlanta(look up the specs on the facility it's just insane the scale). Been there over 10 years not a single technical issue(not even a small blip), and the staff is, I don't have words for how great the staff is there. Maybe partially an artifact of being "in the south" and perhaps more friendly but they are just amazing. Outstanding data center design as well, 400-500k+ of raised floor in the facility, N+1 on everything, and nice and clean. I put our gear in there while it was still somewhat under construction.

By contrast my most hated facility was Telecity AMS5 in Amsterdam (now owned by Equinix). I hated the facility so much(had to put "booties" over your shoes before walking on the raised floor WTF), and I hated the staff even more(endless stupid, pointless policies that I've never seen anywhere else). Fortunately we moved out of that place years ago (before Equinix acquired them).

Splunk dabbles in edgy hardware, lowers data ingestion

Nate Amsden

maybe a new feature...

But I have been filtering and dropping data before it gets into Splunk for many years by sending it to the nullQueue via regular expressions in transforms.conf, a very well documented ability of Splunk.

Not only did it reduce the license costs but it also dramatically cut down on the amount of sheer crap that was going into the indexes introducing a lot more noise making it harder to find things. Simply removing the HTTP logs from our load balancer health checks that returned success per my notes back in 2018 saved nearly 7 million events per day. Overall at that time I removed roughly 22 million events per day for our small indexers that at the time were licensed for 100GB/day. Included in that was 1.5 million of useless windows event logs(these were more painful to write expressions for, one of which is almost 1,500 bytes for the expression). We had only a handful of windows systems, so absurd they generated so many events! 95%+ linux shop.

The developers for our main app stack also liked to log raw SQL to the log files which got picked up by Splunk. I killed that right away of course(with the same method) when they introduced that feature. I also documented in the config the exact log entries that matched the expressions to make it useful for future maintenance.

Don't get me wrong it wasn't a fast process it took many hours of work and spending time with regex101.com to work out the right expressions. Would be nice (maybe fixed now not holding my breath) to be able to make Splunk config changes to the config files and not have to restart splunk (instead perhaps just tell it to reload the configuration).

VMware esxi syslogs are the worst though, I have 59 regexes for those, which match 200+ different kinds of esxi events, at least with ESXi 6 the amount of noise in the log files is probably in excess of 90%. vCenter has a few useful events though I'd guesstimate noise ratio there at least 60-75%.

I had been using nullQueue for the past decade but really ramped up usage in 2017/2018.

Citrix research: Bosses and workers don't see eye to eye over hybrid work

Nate Amsden

Re: Really ?

At my first paying job back in the 90s they had installed I think it was Internet Manager by Elron software just before I started(I looked it up again recently on archive.org to confirm the name). My friend who was in the IT dept got me the position in a new "startup" within the company. The parent company was a 24/7 manufacturing shop. One night someone caught the "shop floor" employees browsing porn in the middle of the night, so they decided to install this software to block porn mostly. I guess it was a transparent proxy of sorts, it routed all internet traffic somehow through my friend's desktop computer (or perhaps just the monitoring aspect), and he could see in real time what every url people were looking at, and it would flag stuff for him to block etc.

The #1 offender (by far) was the VP/brother of the owner of the company. He may of been a co owner to some degree I'm not sure, he was also the head of HR for a while after the HR person left. It also generated a list of top users of internet bandwidth, I was #1 pretty much every time I believe in large part because I used a screensaver (wow can't believe I remember the name now) called PointCast(?) it was a really cool (to me anyway) news ticker thing that pulled in tons of data. They would give me shit for being the top user (by a big margin) every month.

He made a point to browse mostly non english porn sites which the monitoring software had trouble flagging automatically. But he would sit in his office and just browse away and my friend would block the sites in real time on some days. I so wanted to pick up the phone and call him and say something like "Oh wow that's a great site don't you think? I'll save that for myself for later.."

This of course was well before the days of HTTPS being common so everything was clear text.

Every computer I've ever used at a company in my 25 year career has been setup by myself. The last company I was at where I did not have some control of IT systems(despite not officially being in IT since 2002) was about 2006ish. I specifically recall the IT admin guy getting frustrated with my computer, which was Windows XP but I had replaced the Explorer shell with LiteStep (similar to AfterStep which I liked at the time). He couldn't figure out how to do things so he would ask me to do stuff like open control panel for him so he could do something(rare occasion). I don't recall the reason(s) why he would want/need to do something to that computer maybe I asked him, not sure.

Lenovo halves its ThinkPad workstation range

Nate Amsden

Can the new one use ECC RAM?

I still use a P50, and I regret not getting it with the Xeon so I could have ECC memory. Currently with 48GB (max 64GB). Not that I have (m)any issues without ECC just would like it with this much memory. I don't know what I was thinking when I opted for the regular i7.

IMO anything with more than say 4GB should be on ECC where possible, and IMO again beyond say 64GB regular ECC isn't adequate anymore (feel free to do web searches on HP Advanced ECC, IBM ChipKill, and Intel Lockstep mode(Introduced with Xeon 5500), though I like HP's implementation the best by far(as it offers better than regular ECC protection and has zero memory overhead).

Note that HP's Advanced ECC was first introduced in 1996, it's not new technology. Shocking that so many are still relying on regular ECC these days.

SmartNICs power the cloud, are enterprise datacenters next?

Nate Amsden

another issue..

is these things will bring more bugs with them. Take for example iSCSI offload HBAs. They're quite common, perhaps almost universal on storage arrays, but on servers themselves they are rarely used(even if the capability is present, often is on recent systems regardless). While my systems primarily run on fibre channel storage(so I don't have a whole lot of recent iSCSI experience), I have read almost universally over the years the iSCSI offload HBAs are bug ridden and the general suggestion when using iSCSI is to use a software initiator(which by contrast gives far better results in most cases).

I remember my first experience with hardware iSCSI on a 3PAR E200 storage array in 2006. The NIC on the array (Qlogic dual port 1Gbps) had an issue where it would lock up under high load. I managed to mitigate the problem for a long time by manually balancing paths with MPIO (this was before ESX had round robin). Then maybe a month before I quit that job I rebooted the ESX hosts for software updates, and forgot to re balance the paths again. Major issues after a couple of weeks. I remember my last day at the company I had to hand the support case off to a co-worker as the issue was not yet resolved(and was pretty critical impacting production). A couple of weeks later that company replaced all the iSCSI with fibre channel to solve the issue(a patch ended up being made available a few weeks after that). Felt bad to leave them hanging but my next job started pretty quick I couldn't stick around.

I have read several complaints over the years about network cards with TCP offload enabled causing other unexpected issues as well and in many cases the suggestion is to disable the offload. It makes it more difficult to diagnose when you run a packet capture on the VM or on the host and the data differs from what is actually going over the wire because the NIC is doing stuff to the packets before it goes over the wire.

So beyond cost, these Smart NIC people need to be sure their stuff is really robust for enterprises to consider adopting them. Hyperscalers have the staff and knowledge to deal with that extra complexity. Given history I am not really holding my breath that these vendors will be up to the task. But they could find use cases in tightly vertically integrated solutions that are sold to enterprises, rather than a generic component that you can put in your average server.

Elliott Management to WDC board: Spin out or sell flash biz

Nate Amsden

Re: no growth in flash (for WD anyway)?

yeah but the end result for WD is they don't have flash anymore and they get the same money back they spent to get into flash in the first place?

Nate Amsden

certainly the disk drive half. Flash can be sourced from many places. Wasn't aware hybrid drives were still around, I used the early Seagate 2.5" hybrids for several years. Doing a search for hybrid on Western Digital's store indicates they have no hybrid flash/hard disks for sale. Not sure if they ever had one. They do have hybrid storage systems for sale.

Meanwhile(after checking), seems Seagate sells a Firecuda drive that is hybrid, though I don't see mention on the data sheet how much flash the drives have(at least for 2.5", for 3.5" apparently they have 8GB of flash)

Nate Amsden

no growth in flash (for WD anyway)?

Article states WD bought Sandisk 6 years ago for $19 billion and then states the flash business now is believed to be worth $17-20 billion(assuming that is what "enterprise value" means?). Article says the transaction was transformative but the investor action seems to want to just cancel it and get their money back (in a sense assuming they get ~$19 billion if it's sold off?).

I don't have an opinion if it is a good idea or a bad idea either way(don't care), though I believe I specifically recall Chris Mellor here on El reg saying something along the lines of Seagate being doomed because they didn't do something similar as WD(and I think he said so many times), now here we are years later and there's a possibility that WD undoes all that.

Just seems like an interesting situation. What's most surprising to me I guess is the valuation of the WD Flash unit apparently having gone nowhere in 6 years(despite the massive increase in flash usage during that time). Maybe WD paid a super huge premium at the time I don't know.

VMware walks back ban on booting vSphere from SD cards or thumb drives

Nate Amsden

Re: what is vsphere.next

interesting ok, was fearing it was perhaps some kind of "rolling release" of vsphere.

Nate Amsden

As the article states, on a local SSD, or I suppose a spinning disk would work fine too. All of my systems are boot from SAN (over fibrechannel).

Never liked the thought of cheap crap USB flash/SD card being a point of failure for a system that costed upwards of $30k+ for hardware+software (that goes all the way back to the earliest ESXi 3.5 I think?)

If you are large scale likely you may want to check out or are already using stateless esxi, basically boots from the network directly into memory. Sounds neat(never tried it), though seems like quite a bunch of extra work required for configuration hence more useful at larger scales (perhaps starting in the 100+ host range).

My two personal vsphere hosts(run in a colo) boot from local SSD RAID 10. My previous personal ESXi host(built in 2011) did boot from USB(and the flash drive died at one point fortunately it was a graceful failure), because the local RAID controller was not supported to boot(3Ware).

Nate Amsden

what is vsphere.next

I did a web search and found nothing.

VMware says server sprawl is back, and SmartNICs are the solution

Nate Amsden

offloading storage better

I would think offloading of storage would be better, especially given their vSAN stack. Remove the need for CPU, memory etc overhead from the host and put it on the DPU (similar to Simplivity except I assume that only offloaded CPU, also similar to what Nebulon does(I think), in fact perhaps VMware should just acquire Nebulon(I have no experience with it)).

I assume they haven't gone that route yet because storage is more complex than networking(hence my comment about acquiring Nebulon). With these Smart NICs I haven't seen mention of abilities to offload SSL for example (perhaps they do and the news articles just haven't mentioned it). With commercial load balancers like BigIP and Netscaler for example they all have SSL offload chips(at least the hardware appliances). I could see a new type of virtual hardware you could attach to a VM to map a SSL offload "virtual DPU" or something to the VM to provide the hardware acceleration (similar to virtual GPU), so a VM running intensive SSL stuff could leverage that(provided the SSL code supported leveraging the offload).

Meta strikes blow against 30% 'App Store tax' by charging 47.5% Metaverse toll

Nate Amsden

maybe should be other way around

facebook paying for/subsizing half of all metaworld purchases to encourage people to make things worth purchasing.

(never have had a facebook account, well maybe they have a shadow account for me I don't care either way)

Day 7 of the great Atlassian outage: IT giant still struggling to restore access

Nate Amsden

Re: Ah....remember....."cloud" is cheaper......

really it comes down to too many eggs in one basket. Certainly service failures can occur on premises. But pretty much universally those failures affect only a single organization. Granted there can be times when multiple companies are experiencing problems but it's still tiny compared to the blast radius of a SaaS provider having a problem.

My biggest issue with SaaS at least from a website perspective is the seemingly constant need that the provider feels to change the user interface around and convinced everyone will love the changes. Atlassian has done that tons of times and it has driven me crazy. Others are similar, so convinced all customers will appreciate the changes.

Go change the back end all you want as long as the front end stays consistent please.

At least with on prem you usually get to choose when you take the upgrade, and in some cases you can opt to delay indefinitely (even if it means you lose support).

Just now I checked again to confirm. Every few months I go through and bulk close resolved tickets(in Jira) that have had no activity for 60 days. I used to be able to add a comment to those tickets I would say "no activity in 60 days, bulk closing". Then one day this option vanished. I asked Atlassian support what happened and they said that functionality was not yet implemented on their new cloud product (despite us having being hosted in their cloud product for years prior). I can only assume it is a different code base to some extent. Anyway that was probably 3-5 years ago, and still don't have that functionality today. (there is an option to send an email to those people when the ticket closes I don't want that, I just want to add a comment to the ticket).

Don't get me started on the editor changes in confluence in recent years just a disaster. Fortunately they have backed off of their plans to eliminate the old editor(for how long I don't know but it seems like it's about 2 years past when I expected them to try to kill it).

Then there was the time they decided to change the page width on everything in confluence(I assume to try to make it printable), at least in that case they left an option(per user option) to disable that functionality(it messed up tons of pages that weren't written for that option).

The keyboard shortcut functionality drove me insane in confluence as well, for years assuming it was there before(I don't know, I never used keyboard shortcuts in confluence going back to my earliest days of using it in 2006) it was not a problem but past couple of years I would inadvertently trigger a series of events on documents that I did not want just by typing. I was able to undo it every time, and finally disabled the keyboard shortcuts a few months ago.

Atlassian Jira, Confluence outage persists two days on

Nate Amsden

Re: Cloud vs On_Premise

While I knew the situation I had a good laugh anyway. I recently renewed a 10 user server license for confluence that I purchased(?) about 10 years ago(for extremely limited personal use) but had lapsed. The cost to renew was $110, to "true up" the license to current time. That's fine, not a big deal.

Then saw the suggestion hey you can move to data center edition. Again I knew the situation but was curious anyway. To see the $10 price on the left for my existing license(to renew again for another year), vs the lowest cost data center offering of a mere $27,000 I think it was on the right(for 500 users).

But at least the license is still perpetual(for the given version of the product anyway).

I've been using confluence since early 2006, and with the cloud version(inherited from the orgs I worked at) the experience has significantly gone downhill in many aspects. My favorite version of confluence I think was probably version 3(guessing here it was a long time ago, the last version to support editing wiki markup). I have been somewhat relieved that their cloud folks have seemed to have postponed indefinitely their forced migration to the new editor. I had so many issues with it and tickets and phone calls. They kept saying that the new editor will be forced soon and I'd have no choice but to use it. But that was about 2 years ago now and that hasn't happened. Surprising they have not been able to address whatever edge cases the old editor allowed that the new one does not yet. At least they fixed one of my most annoying issues which was keys getting stuck and just printing the same character over and over and over again. Took them weeks to figure it out after trying to blame my computer/browser for the issue.

I use JIRA regularly as well but much less often. I don't use any other Atlassian products.

My regular wiki at home is Xwiki which seems to work quite well, confluence is just for some other stuff that I want to be able to access that I haven't moved(yet).

Microsoft arms Azure VMs with Ampere Altra chips

Nate Amsden

Re: AWS Graviton2 is similar

Graviton2 is a different situation I think. I believe that chip is designed by Amazon, means they reap the benefits more of vertical integration(mainly cost savings, not having to pay higher margin costs to another supplier).

Nate Amsden

what

Who would ever realistically compare a 8 core CPU with hyperthreading against a 16 core CPU (hyper threading or not)? Also the article makes it sound like the cost of a 8x x86 CPU(with HT enabled) is same/similar as a 16 x x86 CPU VM. I assume this is not the case(I have never used Azure, the fixed allocation models of all of the big public clouds have been a big turn off for me starting ~12 years ago so I really haven't paid much attention to them over the years).

Things would be a lot simpler if they just spit out some numbers from some benchmarks to compare the systems. Benchmarks are of course questionable by themselves but the performance claims being made here seem even more vague than benchmark numbers.

However if a single modern ARM CPU core can compete with a single modern X86 CPU core in server workloads that would be interesting, historically anyways it seemed ARM's designs were for just tons of cores on the chip(more than the standard x86 anyway), as an aggregate they may very well be on par with x86 (historically again they have had similar power usage from what I've read, that being 150W+/socket), but you're not comparing core-to-core performance( because the chips don't have the same number of cores - which in many cases doesn't matter I just mention that because the article seems to focus in on core-to-core performance).

Never personally been a fan of hyper threading myself mainly because it's not easy to assume how much extra capacity those threads give(but I haven't disabled it on any of my systems I just measure capacity based on actual cores rather than some funny math to adjust for extra threads).

Linux kernel patch from Google speeds up server shutdowns

Nate Amsden

HDDs too?

is this an issue with HDDs too?(I assume it would be?) I've never noticed anyone complaining with lots of hard disks(assuming they are not abstracted by a RAID controller) complaining about slow reboot over the years.

GitHub explains outage string in incidents update

Nate Amsden

a lot do have this issue, not many companies are public about what causes their outages. DB contention is a pretty common issue in my experience over the past 18 years of dealing with databases in high load(relative to what the app is tested for) environments. I've seen it on MySQL, MSSQL and Oracle, in all cases I've been involved with,the fault was with the app design rather than the DB itself(which was just doing what it was told to do). (side note: I am not a DBA but I play that role on rare occasions).

I remember in one case on MSSQL the "workaround" was to restart the DB and see if the locking cleared, if not restart again, and again, sometimes 10+ times before things were ok again for a while. Fortunately that wasn't an OLTP database. Most critical Oracle DB contentions involved massive downtime due to that being our primary OLTP DB. MySQL contentions mainly just limited the number of transactions the app could do, adding more app servers, more cpu more whatever had no effect(if anything could make the issue worse) the row lock times were hogging up everything.