back to article Igneous ARM CPUs: What if they tossed the blindfold?

Igneous's Ethernet-accessed, ARM-driven disk drives provide a seriously large amount of collective CPU chops to its dataBox/dataRouter array but the poor little suckers work blindfolded. Why would I say that? Think about it like this: a data object, say a photo image, comes into the array to be stored and is split into 20 data …

  1. AMBxx Silver badge
    Facepalm

    Perhaps

    You could ask one of the vendors?

  2. The Count

    The addition of some logic to the drive firmware for interconnection and communication would do what your looking for. May need slightly higher performing CPUs, but it is possible.

    1. Anonymous Coward
      Anonymous Coward

      <panto> Oh no it isn't </panto>

      "The addition of some logic to the drive firmware for interconnection and communication would do what your looking for. May need slightly higher performing CPUs, but it is possible."

      Perhaps you can expand on how that's possible without changes to the existing *applications* running on the main system, for it is only those applications that know how they intend to use the stored blocks of data?

  3. Alan Sharkey

    Hummm

    It's all very well doing object storage at drive level, but how do you know what objects you are going to index (ref the meta data for a photo for example). Wouldn't it be better to have dumber storage (with some basic intelligence) and do the indexing at a higher level - possibly still on the storage somewhere, but with object design provided externally.

    Or am I talking bo**ocks?

    Alan

  4. Anonymous Coward
    Anonymous Coward

    If only there'd been an Interweb in the 1960s

    when ICL invented the Content Addressable File Store. Maybe you remember it (you've been around a while), many readers won't.

    "Wouldn't it be better to have dumber storage (with some basic intelligence) and do the indexing at a higher level - possibly still on the storage somewhere, but with object design provided externally."

    Excellent question.

    "Or am I talking bo**ocks?"

    Don't think so. A stream of bits in storage is just that. Its interpretation depends entirely on what the application(s) in question think(s) they mean. The very same bitstream may mean entirely different things to different applications, and a particular item of real-world data may be represented in multiple entirely different ways in storage. Think of the various different ways a number may be represented, let alone anything more complex, such as a simple picture (JPG, PNG? PDF?) or a complex multi-part document. Today's disk farm doesn't know which one it is, so someone has to tell it. Now, who's going to tell the disk farm what any given bit stream means?

    Doubtless there's a place for a content addressable (or object oriented) file store, but it isn't going to work nicely with applications that have ever been widely used, which eventually expect data to be read as data, and are not accustomed to telling the disk farm what kind of data it is.

    The concept might suit the likes of TwitBook and what have you, whose general technical needs are somewhat restricted, but whose buying power is not so restricted.

  5. Detective Emil
    Boffin

    Don't think it'll fly

    My understanding of erasure codes is that, given any 20 of those 28 chunks that 20 chunks-worth of data is spread across, you can reconstruct the original 20 chunks. Conversely, if you look at any one of the 28 chunks, you cannot tell anything about any part of the original 20 chunks — you need another 19 before you can do that. This means that, by the time a chunk has hit a (somewhat) intelligent drive, it can't usefully be indexed or searched on the drive itself. So indexing has to be done on the original 20 chunks before they hit the array, resulting in more chunks which, with added redundancy, also hit the array.

    "Or am I talking bo**ocks?" (as another poster put it).

    1. Alan Sharkey

      Re: Don't think it'll fly

      I think we are saying the same thing - the intelligence for object storage (and more importantly, indexing), needs to be provided by the viewing platform, not the storage device.

      Now, it may be that the storage device could offer up an object language which could interpret the request for an object and translate it to however it had split the file (on the understanding of file types, content and meta data) but that once again limits what files can or cannot be stored - unless the language is extensible to add new file types [Oooh - can I feel a new startup company firing up :)]

      Alan

      1. Anonymous Coward
        Anonymous Coward

        Re: Don't think it'll fly

        Spot on! The goal in providing a cheap storage platform is to provide a dumb, cheap, storage platform. Having the storage system do the indexing and sorting of complex, higher level, meta-content defeats that purpose and adds overhead. This reminds me of Hadoop; you have a bunch of dumb boxes providing a highly distributed, and not very smart storage array, then another tier of servers to control and maintain the HDFS, and you provide all the content "magic" in the layer doing the mapping and reducing, and the dumb box farm can withstand a reasonable amount of failure and still provide all the data. If I may make a simplified generalization of the Hadoop system. So, in the end, best let the disks do the striping and protecting of the data, and the system that connects and consumes it do the content indexing and meta-data provisioning. Yes!

    2. Wayland

      Re: Don't think it'll fly

      Yes that's my understanding of RAID too. The data lives on all the drives and you can lose any of them to a point and still have all the data. On a mirrored system you can lose either of the two drives and be OK. On a RAID with three drives you could lose any one and be OK.

      Storage is actually quite cheap and for most people far more than they need. It would make sense to have RAID redundancy of greater than 50%. Maybe a RAID with three drives and lose any TWO would be good.

      As for the drives knowing what they are storing then that would make sense for drives say specifically storing photos. The ISO 7 layer model for Networking makes sense here. The top level, application level knows what the data is but the lower levels are simply moving and storing data.

      A visual memory system would be cool but you may want a GPU on each drive to create facial recognition indexes. Maybe better to have that on the RAID card and just let the drives be drives.

    3. kdkeyser

      Re: Don't think it'll fly

      There are 2 classes of erasure codes: systematic and non-systematic.

      For systematic ones (e.g. the common Cauchy-Reed-Solomon), the first n chunks are simply the original object chopped into n pieces. Only the additional ones (e.g. 8 in the example) are actually "parity" and do not consist of the raw original data. So in this case, each drive could in fact see part of the original object.

      For non-systematic codes, your describition is accurate.

      Of course, all of this becomes moot if the object is encrypted somewhere higher up the stack.

  6. Anonymous Coward
    Anonymous Coward

    IBM S/360 channel adapters and ISAM files

    The choice between dumb and intelligent peripherals is as old as computers themselves. The optimal choice at any given time tends to follow the usual Wheel of Reincarnation.

    For example, many eons ago IBM S/360 ISAM files were implemented largely using channel programs, running on the channel processor, (at least logically) separate from the main CPU.

    From the CPU's point of view, this was pure magic - you asked for a record indexed by the desired key, and the disk just delivered it. Which worked great as long as the CPU was slow and memory was limited. Once the memory became less constrained and CPUs became faster, maintaining your own B-tree together with a dumb record-addressed file would beat the intelligent ISAM channel program. And so the wheel keeps turning ...

    All this just goes to prove that while inventing something is easy, inventing something new is much harder.

    1. Wayland

      Re: IBM S/360 channel adapters and ISAM files

      That gives me an idea for a super fast database or indexed file system. Instead of having a disk operating system between your drive and your application simply access the drive direct. The data would be arranged according to the drives parameters. A record would be held at each drive physical address. There might be a bit of wasted space as the record would have to fit in the block size of the drive.

      The speed of this database would be very deterministic since there would be no opportunistic performance increases or slowdowns at inopportune moments as it has to access the drive rather than it's buffer. Everything would come off the drive.

  7. Anonymous Coward
    Anonymous Coward

    two points

    First, RS is effectively a streaming algorithm. Using your example of 20 + 8, to encode a byte at a given offset, you seek to some multiple of 20 in the original file, then multiply the vector of 20 bytes at that location by the coding matrix to get back a 28-byte vector of redundantly-encoded data. Each byte in this vector corresponds to a byte in what you call a "stripe". The decoding step is similar: take any 20 bytes, of the 20 + 8, from the same offset in each "stripe", calculate an inverse matrix and multiply by it to get a decoded 20-byte vector.

    You should see from that that the RS-encoded data is actually seekable. For applications such as those you mentioned (extracting metadata from files once they're in a redundant form), you usually don't need to read the full file. There will often be markers in predictable places to indicate where the metadata resides, or it will use some other scheme that allows finding the metadata block without having to read the full file. So, depending on the software interface that the storage layer provides (ie, whether it provides for sequential/seek access), it should be totally possible to scan the "slices" in situ without needing to decode the full file.

    Second, there is the possibility of doing distributed analysis/crunching of files on the storage nodes. I don't think that there is an off-the-shelf solution for this, but it's easy to understand and implement. With a standard 20+8 scheme, you will have a node controller that users interact with. The users just see files or blocks or whatever, and the controller is responsible for encoding/decoding as well as communicating with the backend of 20+8 storage nodes. To store a file, the control node creates and distributes the 28 "slices", resulting in a bandwidth overhead of 8/20 =40% (8 extra redundant slices). An alternative would be to use a broadcast/multicast setup, wherein the full file is transmitted to all storage nodes, which would then be responsible for calculating their own slice locally. A good reliable broadcast algorithm should add no more than between 5 and 10% overhead, more or less independent of the number of storage silos.

    Besides having lower write overheads, a multicast system also has the potential to do the sort of local processing of files that you're talking about. Each storage node will receive a full copy of the file at check-in time, so you can script whatever analysis function you want (using something simple like node number mod 28 to distribute processing across nodes, or something like a DHT to handle temporary node failures) on the full local copy, discarding the copy once it's complete or if it's determined that some other node is doing that work. The storage node would also be responsible for creating and storing its own slice, so the cost of encoding would be shared across them rather than being centralised on the controlling node.

    (I'm assuming that the 28 storage nodes are all on a single shared net here. Multicast has no overhead advantage if there are actually 28 separate channels)

  8. eranr

    And now for something completely different...

    What if we take striping out of the equation? That is, what if, e.g. one uses 3X replication for redundancy so that devices keep objects as a whole?

    On the one hand expect to pay 3 times instead of 28/20 for data redundancy (following the above mentioned EC scheme used), on the other hand you do not need to move any data in order to process it using the local device's CPU. So what is more expensive BW or raw storage? Those finding this an interesting question might be interested in: http://itsonlyme.name/node/7

    Indexing, IMHO needs to be done outside of the storage system. What can be done though is have the device CPU extract the metadata during ingest (which can be done while seeing the whole object) and send the information to an external cluster (there is Ethernet in the mix...)

    If all this isn't bo**ocks, then the problem I do see here is the rigid compute to storage ratio implied. Another approach for compute and data co-location might be a software only solution such as Openstack Storlets.

    1. Wayland

      Re: And now for something completely different...

      Thats like mirrored RAID but with 3 drives. I see no problem with that, drive space is cheap but data valuable. Perhaps looking at various file systems and seeing if any could have some of their processing put on the hard drive.

  9. WageSlave

    It's in the Object software, not the underlying disks

    Content-intelligent storage management is much more complex than a simple index of test content.

    Pictures can have lookalikes, Songs meta-tagged to sound like other songs, etc., etc., but none of that can be done at the segment level. The indexing would need to be done on data ingest, by an intelligent overlay.

    Sadly, although it's a WIBNI, is a non-article.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon