Reader suggestion: Using HDFS as generic iSCSI storage • The Register Forums

Thursday 14th May 2015 17:52 GMT casaloco

Want

Something every small business has needed for some time. Quality reliable shared network drives... in the cloud.

2 0 Reply

Thursday 14th May 2015 18:01 GMT Steve Loughran

HDFS: the subset of posix it implements

HDFS lacks a couple of features which people expect from a "full" posix filesystem

1. The ability to overwrite blocks within an existing file.

2. The ability to seek() past the end of a file and add data there.

That is: you can only write to the end of a file, be it new or existing.

Ignoring details like low-level OS integration (there's always the NFS gateway for that), without the ability to write to anywhere in a file, things are going to break.

There's also a side issue: HDFS hates small files. They're expensive in terms of metadata stored in the namenode, you don't get much in return. It's while filesystem quotas include the number of entries (I believe; not my code)

What then? Well, I''d be interested in seeing Greg's prototype. Being OSS, Hadoop is open to extension, and there's no fundamental reason why you couldn't implement sparse files (feature #2) and writing within existing blocks via some Log-structure writes-are-always-appends thing: it just needs someone to do it. Be aware that the HDFS project is one of the most cautious group when it comes to changes, protection against data loss comes way, way ahead of adding cutting edge features.

without that you can certainly write code to snapshot anything to HDFS; in a large cluster they won't even notice you backing up a laptop regularly.

One other thing Hadoop can do is support an object store API (Ozone is the project there), so anything which uses object store APIs can work with data stored in HDFS datanodes -bypassing the namenode (not for failure resilience but to keep the cost of billions of blob entries down. Anything written for compatible object storage APIs (I don't know what is proposed there) could backup to HDFS, without getting any expectations that this is a posix FS (that's untrue: I regularly have to explain to people why Amazon S3 isn't a replacement for HDFS).

To close then: if this is a proof of concept, stick the code on github and everyone can take a look.

stevel (hadoop committer)

11 0 Reply

Thursday 14th May 2015 21:23 GMT Alistair

Re: HDFS: the subset of posix it implements

Dunno *why* it waited for me to get you an upvote Steve.

I've *read* part of the code, since there is an app team in my shop that thinks HDFS is a good place for SAS temp files and report data.

Update, edit are not options on HDFS in large scale - I noticed the hook for object stores, but even there it looks to me that its more worm style than anything else.

Explaining that HDFS doesn't do these things takes some time - especially with some of the moronic marketing crap out there. (I've seen stuff that should have made the sales techs blush..... and I think I may have made one or two of them actually blush.)

2 0 Reply
Friday 15th May 2015 12:54 GMT Dr. Mouse

Re: HDFS: the subset of posix it implements

Discaimer: I know nothing about HDFS. However:

The only one I can see as an issue with the idea he suggests is #1. He is suggesting iSCSI use, in which case we are talking about a disk image. Therefore small files, seeking past the end (unless you are creating sparse images) and other points you mention don't apply. We are talking about exporting a single large file over iSCSI to be used as a disk on another system, which will partition and format it with it's own filesystem.

0 0 Reply

Thursday 14th May 2015 18:11 GMT Matt Piechota

Since iSCSI is block based, we're talking about storing a ton of (say) 4k blocks in Hadoop and the iSCSI service asks for or updates block #12309854? It doesn't sound terrible. I could see some concurrency issues with multiple iSCSI services, but I'm not sure they're worse than iSCSI to traditional block storage as they'd still need a cluster-aware file system on the clients.

The big question is: how/why is this better than something like DRBD?

1 0 Reply

Thursday 14th May 2015 18:21 GMT Henry Wertz 1

Seems reasonable to me

Stevel's comments might be a real problem -- in particular, overwriting blocks within an existing file might be something a lot of software expects to be able to do. And, in particular, if you have disk images served via iscsi I would think this software would be particularly likely to want to scribble into either a disk image or a differences file.

Otherwise, I think this sounds quite reasonable -- why spend the huge bucks on specialized SAN hardware when this will do the same thing? The one reason ordinarilly would be stability but a) HDFS is proven software with known stability. b) SAN vendors have flubbed it now and then too.

0 0 Reply

Thursday 14th May 2015 18:53 GMT Nate Amsden

sounds terrible

Reminds me of a suggestion my VP of technology had at a company I was at several years ago, he wanted to use HDFS as a general purpose file storage system. "Oh, let's run our VMs off of it etc.." I just didn't know how to respond to it. I use the comparison as someone asking for motor oil as the dressing for their salad. I left the company not too long after and he was more or less forced out about a year later.

There are distributed file systems which can do this kind of thing but HDFS really isn't built for that. Someone could probably make something work but I'm sure it'd be ugly, slow, and just an example of what not to even try to do.

If you want to cut corners on storage, there are smarter ways to do it and still have a fast system, one might be just ditch shared storage altogether (I'm not about to do that but my storage runs a $250M/year business with high availability requirements and a small team). I believe both vSphere and Hyper-V have the ability to vmotion w/o shared storage for example (maybe others do too).

Or you can go buy some low cost SAN storage, not everyone needs VMAX or HDS VSP type connectivity. Whether it's 3PAR 7200, or low end Compellent, Nimble or whatever.. lots of lower cost options available. I would stay away from the real budget shit e.g. HP P2000 (OEM'd Dot hill I believe), just shit technology.

3 0 Reply

Thursday 14th May 2015 18:55 GMT Ben Liddicott

Then provision disks from the SAN and add them to the Hadoop cluster

Rinse and repeat, Hey presto!!! Infinite storage!!!!

3 0 Reply

Thursday 14th May 2015 19:35 GMT SecretBatcave

Yes, but what do you want to do with it?

I mean, lets be honest, HDFS isn't a generalised block storage system. Its not ever particularly well designed for the job it intended to do.

If you want to cluster bits of block storage into one name space, there are many, many better ways of doing it.

For a start HDFS only really works for large files, large files that you want to stream. Random IO is not your friend here. So that makes it useless for VMs.

If you want VM storage, and you want to host critical stuff on it you need to do two things:

* capex some decent hardware (two 4u 60drive Iscsi targets) and let VMware sort out the DR/HA (which it can do very well) 60grand for hardware plus lics. That'll do 300kiops and stream 2/3 gigabytes a second.

* capex some shit hardware, ZFS, block replicate, and spend loads of money on buying staff to support it. and the backups

Seriously there is/are some dirt cheap systems out there that'll do this sort of thing without the testicle ache of trying to fiure out why your data has been silently corrupting for the last 3 weeks, and your backup has rotated out any good data.

So you want a custom solution:

1) GPFS and get support from pixit (don't use IBM, they can hardly breathe they are that stupid) <-fastest

2) try ceph, but don't forget the backups <- does fancy FEC and object storage

3) gluster, but thats just terrible. <- supported by redhat, but lots of usersapce bollocks

4) lustre, however that a glorified network RAID0, so don't use shit hardware <- really fast, not so reliable

5) ZFS with some cheap JBODs (zfs send for DR) <- default option, needs skill or support

Basically you need to find a VFX shop and ask them what they are doing. You have three schools of thought:

1)netapp <- solid, but not very dense

2)GPFS <- needs planning, not off the shelf, great information lifcycle and global namespacing

3)gluster <- nobody likes gluster.

Why VFX, because they are built to a tight budget, to be as fast as possibly, and reliable as possible, because we have to support that shit, and we want to be in the pub.

2 0 Reply

Thursday 14th May 2015 22:07 GMT Dan 10

I wondered recently whether HDFS would provide commodity iscsi storage for an ETL staging layer using an extract-once, read-many approach, but ISTR the random reads and variable file sizes (and other stuff I can't recall) didn't make it very friendly. The more obvious GPFS appears a better choice.

0 0 Reply

Friday 15th May 2015 02:02 GMT Tom 64

Fine, if you don't care about performance

HDFS is dog slow. I mean really, really sloooooow.

I'd probably not consider using it like this, you're much better off with a dedicated NAS unit with RAID, if you care at all about performance.

0 0 Reply

Friday 15th May 2015 12:29 GMT Alistair

Re: Fine, if you don't care about performance

HDFS is "slow" in certain contexts, but when you use it the way it was designed, it actually rocks. Its the right tool for certain types of work. Sadly most folks think its just a file system.

2 0 Reply

Friday 15th May 2015 07:20 GMT Anonymous Coward

Why go from file system to blocks to (on the client) file system? Seems a bit inefficient to me, and I would probably go for straight file level protocols right away and "do away" with some of the complexity.

IF block level protocols is a must, I'd say that there are way better choices for the backend. Just go with the standard file system of the server, or a fancy one like ZFS that has lots of bells n' whistles.

When it comes to backup & dr, I'd look into if that could not be solved at the application level, or if the server has some tools that could be used.

To be able to say anything useful and not overly generic like above, it would be useful to have more information about the environment and the setup.

0 0 Reply

Friday 15th May 2015 08:05 GMT Anonymous Coward

Wow not too many people took the bait and responded to something that is an obviously bad idea

1 0 Reply

Friday 15th May 2015 08:13 GMT joe.harlan22

Wrong tool for the job!!!

Square peg in a round hole much? Honestly, just because HDFS is a great data-hammer doesn't mean all your data is shaped like a nail. I would keep this idea locked firmly in a lab and take it no further.

0 0 Reply

Friday 15th May 2015 08:15 GMT Anonymous Coward

Maybe not HDFS

Depending on your use case maybe Ceph, Gluster, Swift, or Scality would work better. Ceph does block if that's a requirement, but all of these are commodity server based storage grids.

0 0 Reply

Sunday 17th May 2015 16:41 GMT dellock6

I vote too for Ceph for this kind of use case, it's designed to support (among other things) block devices and has a lot of implementations in production environments. Sorry but this one with HDFS sounds more like "I want to do something different from anyone else".

1 0 Reply

Monday 18th May 2015 09:50 GMT ntevanza

Location, location

Do VSANs have Hadoop-style location-aware redundancy? If not, why not?

0 0 Reply

Tuesday 19th May 2015 04:12 GMT Anonymous Coward

Re: Location, location

Ceph does, you specify it in the CRUSH map where you want replicas placed. The CRUSH map is a tree, ultimately the leaves are individual HDDs/SSDs. Branches represent things like data centres, buildings, rooms, racks, nodes … whatever your physical topology is.

The default is to consider all nodes as branches of the root, and the disks as leaves off those branches. It'll pick a branch (node) that presently does not hold a copy of the object to be stored, then choose a leaf (disk). It repeats this for the number of replicas.

0 0 Reply

Friday 3rd July 2015 18:09 GMT david@stanaway.net

TwinStrata

Sounds like TwinStrata, and other cloud tiering appliances.

0 0 Reply

Saturday 11th July 2015 19:03 GMT Anonymous Coward

Topics

Special Features

Vendor Voice

Resources

COMMENTS

Want

HDFS: the subset of posix it implements

Re: HDFS: the subset of posix it implements

Re: HDFS: the subset of posix it implements

Seems reasonable to me

sounds terrible

Then provision disks from the SAN and add them to the Hadoop cluster

Yes, but what do you want to do with it?

Fine, if you don't care about performance

Re: Fine, if you don't care about performance

Wrong tool for the job!!!

Maybe not HDFS

Location, location

Re: Location, location

TwinStrata

other better ways to solve the problem

POST COMMENT House rules

Enter your comment

Add an icon

Other stories you might like

AWS must pay $525M to cloud storage patent holder, says jury

AI boom is boosting demand even for HDDs, raising prices by up to 20% since Q3

San Francisco's light rail to upgrade from floppy disks

Samsung enterprise SSD prices skyrocket thanks to AI's appetite for storage

RISC-V PCIe 5 SSD controller for the rest of us hits 14GB/s

We talk to W3C board vice-chair Robin Berjon about the InterPlanetary File System

Microsoft sends OneDrive URL upload feature to the cloud graveyard

China breakthrough promises optical discs that store hundreds of terabytes

Snowflake share price falls after revenue forecasts dip below expectations

FOSS replacement for Partition Magic, Gparted 1.6 is here to save your data

Backblaze's geriatric hard drives kicked the bucket more in 2023

You're not imagining things – USB memory sticks are getting worse

About Us

Our Websites

Your Privacy