* Posts by l8gravely

27 posts • joined 11 Jun 2012

LTO-8 tape media patent lawsuit cripples supply as Sony and Fujifilm face off in court

l8gravely

Bwah hah ha!

I've got something like 18,000+ pieces of LTO-4 media kicking around in Iron Mountain. Talk about a horror show to restore! And I'm not even sure if that includes all my old SDLT tapes... must check out that setup as well...

Tapes aren't going anywhere. The other problem is filling tapes fast enough so they keep streaming, and not shoe-shining back and forth. That's what kills performance and tape heads. So yes, you do want disk in front so you can spool there as fast/slow as you like, then rip it off to tape super fast when ready.

Is that a stiffy disk in your drive... or something else entirely?

l8gravely

Re: Help! My stiffies stuck in the slot

God I remember the old Fred Fish disks for the Amiga... used to wait around for the latest collection to arrive via copies of a friend of downloaded fromt the BBS at 9600, or 28.8K if I was lucky! Those were the days...

I always through that Commodore was fools for not giving away a C compiler with the base system. The old Lattice C compiler was so stupid expensive that you couldn't afford to use it. And gcc wasn't ported in those days to M68k processors.

I still miss my A2500 with Mech Force, great PD top down mech fighting game. Friends and I spent hours and hours playing that and building mechs.

Brit rocket boffins Reaction Engines notch up first supersonic precooler test

l8gravely

It's not going to work out well....

First off, LOX in bulk is *cheap*, doing all the testing and development of this sucker is not cheap at all.

Second, a rocket needs to accelerate, and accelerate alot to get into orbit. Jet engines are, to a large degree, optimized for cruise. You don't cruise on your way to orbit.

Third, how is this pre-cooler going to do when it sucks in a bird? How benign are the failure modes?

'Numpty new boy' lets the boss take fall for mailbox obliteration

l8gravely

Re: I've learnt to admit my mistooks!

Absolutely! who has the time and energy to keep track of lies? Oh right, sales droids....

Can your rival fix it as fast? turns out to be ten-million-dollar question for plucky support guy

l8gravely

Re: "you need to keep really, really close tabs - and lots of comments...

Thanks for the pointer! This fits me to a T!

NASA gently nudges sleeping space 'scopes Chandra, Hubble out of gyro-induced stupor

l8gravely

It would be a shame for NASA to not be thinking how they could do another servicing mission to Hubble using either SpaceX or Beoing's capsules to replace all six gyros with new ones, along with any other instrument upgrades that could be done easily. It's obvisouly not nearly as easy to do as it was with the Shuttle, but still... it would be a useful extension if possible.

Early experiment in mass email ends with mad dash across office to unplug mail gateway

l8gravely

Are you kidding? People come and bitch at me if the email doesn't arrive instantly now, $DEITY help us if it's not there across the world in two minutes or they're pissed.

Sysadmin shut down server, it went ‘Clunk!’ but the app kept running

l8gravely

Re: Long uptimes are a disaster waiting to happen

I agree that rebooting things more frequently is a nice thing to do, but when you have legacy applications which aren't horizontally scalable, then it can be extremely difficult to get the downtime. I had a bunch of Netapps with over three years of uptime before I was allowed to shut them down, and then only because we were moving them across town.

Let me tell you, when they booted and came up fine, I was very happy! They were off netapp support, and disk failures were at the wrong end of the bathtub curve... it's got to be replaced one of these days, but they won't move until it falls over I suspect.

Disk firmware can kill a whole cluster how exactly? Cisco explains

l8gravely

I love how a disk firmware problem requires a UCS manager update!

I love how a disk firmware issue requires an update of the entire UCS manager stack. And since I've recently gone through the pain of setting up a UCS HyperFlex cluster, I can tell you it will suck suck suck.

Hyperconverged is supposed to make things easier, not harder. UCS HyperFlex is all the flexibility of UCS with all the complexity and pain tripled or even more. It sucks... don't go there.

PC nerds: Can't get no SATA-isfaction? Toshiba flaunts NVMe SSD action

l8gravely

Re: What about endurance?

Right, I bet the Pro model has a much better warrantee than the new version, and more drive writes per day metric as well. The details matter.

But dang, those numbers are nice! Now to get a PCI riser card for my old system(s) to use stuff like this.

Sysadmin hailed as hero for deleting data from the wrong disk drive

l8gravely

Re: Personal Tragedy

You do realize that these cards will fail over time? Just like CD-RW discs? They only real way to keep stuff like this safe is seperate RAID storage where you move the data to new media every five to ten years.

BOFH: But I did log in to the portal, Dave

l8gravely

Bravo!

Bravo!

‘I crashed a rack full of servers with my butt’

l8gravely

Re: Just finger trouble

I've got you beat on the little fingers department. Way back when, my son was around 2 years old and the wife's computer was in the basement next to my lair of untidyness. Which she did not appreciate at all! Anyway, I had an old PC (gateway? HP?) where the case slid forward to add/remove stuff inside it. I had canibalized it for some parts and left it on the floor *mostly* closed up. Wife is working with the nipper playing under the desk. Then all hell breaks loose because he pushed in the power switch and the flap of spring steel caught his finger and wouldn't let go. Normally he could have pushed it all day until the cows (or I) came home. But the button popped out and he got trapped.

I got a hysterical call from her and since I was 30+ minutes away and she's not really the techie type, esp under pressure like this when the first born is wailing his head off... I told her to call the local police for help. They came, got the finger out without loss and I got a royal reaming when I showed up just after they had left.

Consequently, her computer was moved upstairs, and I had to do a major cleanup of my mess.

Now the kid is as tall as I am... memories.

VCs palm Igneous $15m to give its Arm-powered backup boxen a leg up

l8gravely

So how big a WAN link do you need to backup 10tb nightly?

So... how do you backup 10tb of data nightly to the cloud? All of these guys who think that backups can move to the cloud don't seem to include network costs into their calculations...

Death notice: Moore's Law. 19 April 1965 – 2 January 2018

l8gravely

It's the memory being so much slower...

The root cause of this is the mismatchi between access speeds of CPUs to registers, local cache, level 2 cache, memory, then NVMe (or PCIe busses), then SSDs, then Hard Drives, then modems, then teletypes, ad infinitum.

The entire reason the CPU designers DID speculative execution was that memory wasn't fast enough to feed data as quickly as they wanted, which is why on-chip caches have been used for decades. Speculation is just one way of going down the *probable* path and getting work done as fast as possible. In the not-so-often case you're wrong and you suffer a stall as you throw out all your (speculative) work and do the work you need to wait for cache/memory/disk/etc to feed you the instructions and/or data you need.

Remember how we used to add memory to systems so we wouldn't swap? Now we add SSDs to get better performance?

Todays CPUs are really quite super duper fast, when you can feed them data/instructions fast enough. But once you can't, you have to wait, which sucks.

I suspect that in the next few generations, we're going to see an increase in both on-die cache sizes, increases in memory bandwidth and memory access speeds being the way forward to speed up systems again. We already see this with the DDR->DDR2->DDR3->DDR4 and single/double ranked DIMMS. But that's going to have to change, possibly to a more serial bus to get improved performance out of DIMMs, which are then read in a more parallel way internally.

But it's going to cost. Is it smarter to reduce the number of cores, but use that transistor budget to drastically increase the local cache size? And the associated cache management machinery? Will we go back to off-chip level 3 caches to help right the imbalance?

Lots of possibilities, we just now have hit another inflection point in CPU and system design where there will be a bunch of experimentation to figure out the best way forward. I forsee a bunch of companies all taking the x86_64 instruction set and trying to re-architect the underlying engine that it runs on to investigate these various tradeoffs. Yes, process improvements will help, but I think the pendulum has swung to the design side more strongly now, and we'll see more interesting ideas (like the Crusoe processors from Transmeta, etc).

Will Intel and AMD go away? No. They'll be here. But I strongly suspect they'll be doing some radical rethinking. And probably buying some smart startup with the next interesting way forward. Whcih will be hard since the motherboard and memory guys will needed to plan out the next steps in the off-chip connections and interfaces to get data moved to/from memory faster and with less latency.

Curse of Woz strikes again – first Fusion-io fizzles out, now Primary Data goes down

l8gravely

Re: Dumbass idea - its just a scam to make money from idiot investors

I too ran Acopia boxes, untill they just fell over and couldn't handle our load, which was an engineering environment with millions of files. And while the idea is great, the execution details matter. And if you want to do backups... you're screwed.

1. If you do backups *through* the Acopia or other box, then you put more load on the system. Restores are easy though...

2. If you backup the back end data stores directly to tape/disk/magic box, then backups are fast(er) and easy(er) to do. But restores *suck* *suck* *suck* because now you have to trawl through multiple backends to try and rebuild the contents to match reality. God that sucks.

So then we tried CommVault's integrated system, which managed backups and archives together. Much better... until it too fell over because CommVault *sucks* at keeping file indexes online and useable for any serious amount of time. Again, retores work fine... until they fall over and it all starts to suck big time.

It's *so* much simpler to just buy more storage if that's what you need. 'cause god knows you'll never get management to actually put their foot down and make the staff cleanup their crap files from five years ago because "we might need that!". Twats.

'The capacitors exploded, showering the lab in flaming confetti'

l8gravely

Re: Finding fault capacitors with high current PSUs used to be the norm

Actually, it worked with Netapps up until around 15 years ago, with the Fibre Channel based drives in the old DEC StorageWorks contrainers on the F330s and F740 series filers. We had a triple disk failure in a raidgroup on a *very* important volume. I managed to pull the working circuit board from the drive making the scraping noises and put it onto the drive which just sat there doing nothing. And presto, the darn thing booted up again and served data again. There were alot of very relieved engineers that day since it held the main ClearCase database for the product the company made.

That was certainly a stressful day, since going to tape was going to take a *long* time no matter what with DLT7k drives to restore multiple terabytes of data. Ugh!

Official: Perl the most hated programming language, say devs

l8gravely

(((((((cdr lisp))))))))

lisp by far is the worst... never could wrap my brain around it. which probably says more about me...

IT admins hate this one trick: 'Having something look like it’s on storage, when it is not'

l8gravely

Been there, done this for both styles of migration. They all suck

I manage enterprise NFS storage and over the past 15+ years I've worked with various tools and solutions to the problem of moving old files from expensive storage to cheaper storage, either disk or tape. And it all sucks sucks sucks.

If you do backups to Tape, then suddenly you need to have a tight integration with the storage and backup vendors, otherwise restores (and people only care about restores!) become horribly horribly painful to do.

We tested out a product from Acopia, which had an appliance that sat in front of our NFS storage. It would automatically move data between backend storage tiers automatically. Really really slick and quite neat. Then when we implemented it we ran into problems. 1. With too many files, it just fell over. Ooops. We had single volumes with 10Tb of data and 30 million files. Yeah, my users don't clean up for shit. 2. Backups sucked. If you backed up through the Acopia box, then you were bottlenecked there. If you backed up directly from the NFS storage (either NFS mounts or NDMP) then your data was scattered across multiple volumes/tapes. Restores, especially single file restores were a nightmare.

We then tried out a system where our backup vendor (CommVault) hooked into the NFS Netapp filer and used the fpolicy to stub out files that were archived to cheaper disk and then to tape. This worked... backups were consistent and easy to restore since CommVault handled all the issues for us.

But god help you if a file got stubbed out and then the link between got broken, or you ran into Vendor bugs on either end. It also turned into a nightmare, though less of one I admit. But it also didn't scale well, and took up a ton of processing and babying from the admins.

I'd really like to get a filesystem (POSIX compliant please!) that could do data placement on a per-directory basis to different backend block storage volumes. So you could have some fast storage which everything gets written to by default, and then slower storage for long term archiving.

Again, the key is to make restores *trivial* and *painless*. Otherwise it's not worth the hassle. And it needs to be transparent to the end users, without random stubs or broken links showing up. Of course, being able to do NDMP from the block storage to tape would be awesome, but hard to do.

Replication, snapshots, cloning. All great things to have. Hard to do well. It's not simple.

Just scanning a 10tb volume with 30 million files takes time, the metadata is huge. I use 'duc' ( https://github.com/zevv/duc) to build up reports that my users can use via a web page to find directory trees with large file sizes across multiple volumes, so they can target their cleanups. And so I can yell at them as well with data to back me up.

Running 'du -sk | sort -n' on large directory trees is just painful. Again, having a filesystem which could keep this data reasonably upto date for quick and easy queries would be wonderful as well. No need to keep it completely consistent. Do updates in the background, or even drop them if the system is busy. That's what late night scanning during quiet times are for.

It's not a simple problem, and it's nice to see various people trying to solve the issue, but ... been there. Tried it. Gave up. It's just too painful.

Support team discovers 'official' vendor paper doesn't rob you blind

l8gravely

Of course it's odd, they did crap trouble shooting

Of course it's odd, the engineer did crap troubleshooting onsite. When we had problems, they always brought their own tape(s) and standard labels to check the robotics with known good stuff. Once they saw that their combo worked, they would berate you if they found out you had gone with sub-standard (theirs!) labels.

Who else remembers printing out DLT barcodes on plain paper with a laser printer since the vendor supplied ones were pure robbery. And of course the hardware at the time wasn't nearly as good and reliable.

The old exabyte EXB-120 libraries we're amazingly finicky about labels too.

GitLab freezes GraphQL project amid looming Facebook patent fears

l8gravely

Re: mm, graphql looks familiar

You'd think Alice would hold here... since a language is just a way of describing mathematics. On a computer.

Nokia snatches clump of 16nm FinFETs, crafts 576 Tbps monster router

l8gravely

Because they put the new in the center... and the old still works to the edge

My old company did the same thing. They would come out with a big huge core router which would be supported bt the configuration management sofrware. It would drop into the network (phone companys and network providers) in the core to satisfy a bottleneck. The older gear would be pushed to the edge to give a performance boost there, since the costs were already paid. The costs include transitioning to a new management and configuration system which can provision the switch without problems.

Thats the reason why Nokia put this monster out, to make sure customers are locked into their system and have a path forward for more growth as needed.

The biggest British Airways IT meltdown WTF: 200 systems in the critical path?

l8gravely

Management fears testing

The biggest problem is that management fears testing of failover, because those even higher up the chain will then blame them (middle management) for taking "risks" that weren't needed. I look at the stuff Netflix has done with the Simian Army and wish I could do the same here. But... Netflix has a totally different model and application stack. They scale horizonally and completely. So it's trivial to add/remove nodes. And killing them off is just good testing because it does show you the problems.

But most applications are vertical. It's a DB -> App -> Web server -> users. At each level you try to scale horizontally, but it's hard, and takes testing and a willingness to break things and then fix them. Most management aren't willing to break things because they view that through the short term mindset of it's losing them money or costing them customers. Which are the easy things for them to measure.

But knowing that your have *tested* resiliency, and a better understanding of bottlenecks, that's good but damn hard to quantify.

I had an issue where we got a large engineered DB setup sold to us, with integrated storage of two different types. We started using the slower basic stuff for storage and DBs because it was there and we had already paid all this money so why not use it? And it turned out that as the load creeped up, the thing plateaued and then fell off a cliff due to contention. Once we moved stuff off to other storage, things were ok.

The point I'm trying to make here is that until you have an event that is quantifiable or business impacting, the tendency is to wring as much performance or utilization out of a system as possible. Having a spare system sitting around doing nothing costs money. Why not utilize the other 50% of your capacity when you have it available? So what if when one node fails in an HA pair you suddenly have 150% load on the system and it craps the bed? It meant you didn't have to buy another pair of systems to handle the growth! Which costs hard money and is easy to justify the denial at the time.

Afterwards, you just be the money will flow to make things better. For a bit. Until the next bean counter comes in an looks for cost savings. But of course it does go too far in the other direction, where you have lots and lots of idle systems sitting around wasting resources for an eventuality that doesnt' come often.

But, back to my original arguement, that testing is hard on management because if it does go wrong , they're up the creek which they don't like. If it goes well, then they don't care. So if you can, try to setup your systems so that you actually test your failover setup constantly, so you won't be surprised when it does happen.

And let me know how it goes, I want to learn too!

Primary Data's metadata engine 'speeds up' NAS, saves 'millions'. Leaps burning buildings, too?

l8gravely

Been there, done that. Failed

I've been there and done this using Acopia for NFS tiering of data. It's great until you run into problems with A) sheer numbers of files blowing out the cache/OS on the nodes. B) Now your backups are a horrible horrible horrible mess to implement.

Say you have directory A/b/c/d on node 1, but A/b/c/e on Node 2? How do you do backups effectively? You can either use the backend storage's block level replication, because you can use NDMP to dump quickly to tape. But god help you when it comes time to restore... finding that one file across multilple varying backends is not an easy case.

And this link cloud storage is a joke when they're talking Terabytes, much less Petabytes of information.

This is completely focused on scratch volumes and spreading the load of rendering and large compute farms with large IO loads of temp files. Once the data is set, it's obviously moved elsewhere for real backup and retention.

Then, once we kicked Acopia to the curb (don't get me wrong, it was magically when it worked well!) we moved to using CommVault's integrated HSM/Backup solution, which worked better... but still not great. God help you if the index to files got corrupted, or if the link between CommVault and Netapp to intercept accesses to files moved to tape or cheap other disk storage got delayed or just broke. Suddenly... crap took a while to restore by hand.

I've seriously come to believe that this is truly a hard problem to solve. Now if you can do like what the people above are doing, targetting large scratch volumes where people want to keep stuff around for a while but don't care as much if backups are done (or can spend the money to tier off to cheap cheap cheap disk... maybe it will work.

But once you have a requirement to go offsite with tape, you're screwed. Tape is still so cheap and understood and effective. Except for time-to-restore, which sucks.

Too many companies believe that sending fulls offsite each night will magically give them DR when they only have one datacenter. Gah!!!

Sorry, had to get that off my chest. Cheers!

Paint your wagon (with electric circuits) but leave my crotch alone

l8gravely

Re: A semi-serious question...

I agree, belt holster is the way to go. I tried carrying a wallet up front due to pains in the hip, but even with the "loose" jeans, it was still too much.

I'm amazed how many women carry their phones in the back, and yet don't break them nearly as much I would think. I did that with a second phone and managed to put a crack in the screen and a bend in the case, just from sitting on my not too considerable ass. I'm 185lbs, 6'2" so I'm not that obese...

Behold this golden era of storage startups. It's coming to an end

l8gravely

And what about backups?

The elephant in the room here is backups. And DR. None of these adequately solve those issues in a useful way. I wish they did, but that's why they'ret still making LTO# drives, for those who need to move large amounts of data around.

It has gotten cheaper, but doing DR over 100mb/s links when you have 1Tb of changes a day and 60-80ms of latency... not going to happen. Maybe, just maybe, if you have a 1Gbit/sec link you could do that. Assuming you can afford the price.

So how do all these hyperconverged people handle an outage of the entire cluster? Or a fire? Or a flood? It's all well and good until you need to recover it, then you're up the creek. So I think there's plenty of spaces for innovation to happen in the storage arena for sure.

John

Ten... Sata 3 SSDs

l8gravely
Go

Ranking suggestion...

Looking over the selection, it's still hard to make a choice. I'd love to see a graph which shows the size vs cost vs warrantee length. I'd be willing to pay extra for a longer warrantee, since I could amortize that expense over a longer time. Maybe Cost/GB/Year would be an interesting metric to look at?

As others have said, random read/write performance is much more interesting that sequential, and would also impact my purchasing more. But really, it's all about reliability first, since the speed is good across all of them.

Now to figure out how hard it would be to move a windows 7 install from HD to SSD without bringing along all the data. Or only some of it.

Biting the hand that feeds IT © 1998–2019