* Posts by Yaron Haviv

52 publicly visible posts • joined 22 Nov 2014

Page:

'Lambda and serverless is one of the worst forms of proprietary lock-in we've ever seen in the history of humanity'

Yaron Haviv

Re: It's not the serverless part that is the lock-in

Lambda can only be triggered by AWS services (e.g. Kinesis), you cant use cross cloud or open source as trigger (e.g. Kafka, RabbitMQ, ..). and the API you use contain AWS specific elements like ARNs and fixed Schema..

if you use Lambda for HTTP only and some S3 storage for state you can get by, but that is quite a limited use of serverless.

Yaron Haviv

You can have serverless w/o the tradeoffs

right, Lambda is (very) slow/inefficient, and locks you in since it depends on AWS data, logging and streaming services (assuming you go beyond a simple web-hook function). Yet CoreOS is not providing an alternative, just the underline container infra.

Checkout nuclio (https://github.com/nuclio/nuclio), its 100x faster, completely open, portable, has open/pluggable event & data sources (support AWS Kinesis but also Kafka, RabbitMQ, NATS, MQTT, ..)

It can run as a native Kubernetes extension or a standalone docker, read more in:

https://thenewstack.io/whats-next-serverless/

Yaron

Hey blockheads, is an NVMe over fabrics array a SAN?

Yaron Haviv

Re: NMVeF does not require RDMA

So if its over say TCP or FC, what makes it different than iSCSI or FCP? Just adding another protocol to the block?

I also fail to see the big value of NVMeOF, EMC Thunder used ISCSI RDMA (iSER) with few M IOPs 1:1 server:client 5 years ago, and at least had the iscsi control layer to deal with discovery, failure, multipath..

we dont need more blocks, but move up the storage stacks to provide files, objects and databases over native flash + fabric. Is there a single new database today which isn't distributed? If so why do we need to distribute the blocks as well? Just to keep fabrics busy?

Yaron

Splunk slam dunk as FC SAN sunk by NVMe hulk

Yaron Haviv

Re: Nice, if you don't know how Splunk works

Nate, point is not Supermicro, use any server you like w few NVMe directly attached, no need for a SAN, not to Mention FC with its high $/BW

BTW if you are so sensitive to the brand i assume Apeiron probably won't be in your short list of SAN vendors

Yaron

Yaron Haviv

Nice, if you don't know how Splunk works

Splunk has few data tiers and #1-2 are optimized for DAS, why the hack would you use Splunk w a dual head array?

Instead of an NVMe array/SAN, just plug 24 NVMe cards in the box (w Supermicro) or attach a JBOF to each indexer, HA/replication/dedup is handled by Splunk

Nice Marketing, BTW Splunk and its OpenSource twin ElasticSearch can benefit from a clustered FS w some special config, but SAN is a stretch :)

Yaron

Shrinkage!? But it's sooo big! More data won't leave storage biz proud

Yaron Haviv

Re: Cloud versus on-prem in depth.

Trevor,

yes you come strong, and dont have all the facts straight, i happen to be deeply involved in those Clouds and know how they really work. i'm not a fan of public cloud, but think they have better architecture than traditional ones, and we should adopt their approach regardless if its Public/SP/on-prem. we can learn a lot from their innovation, most Enterprise tech haven't changed for 20 years.

if so many ppl left the cloud, it doesn't explain the surge in AWS and Azure revenue, i attended re:invent can tell you many Enterprise guys were there, some have a Cloud ONLY strategy ("CEO said we wont own any HW")

Fact #1, Azure doesnt use SAN/HCI, they use distributed FS (SMB3) and Blob store over 40/50GbE & RDMA, can point you to public links, VM images are files but many SaaS/PaaS offerings use files directly, they perform faster than most block storage, on the storage you can see their design is lots of NVMe & JBODs. millions of Office 365 users including myself and it works like a charm, better than the Exchange i used to have running over NetApp.

Fact #2, most data in the cloud is in those PaaS, SaaS and object not AWS EBS which is more expensive and limited, and all those file/object/db/.. PaaS & SaaS are using DAS or object for capacity. again hardly any SAN or HCI.

Fact #3, 80-100 bay 3.5" JBODs cost ~$5K attach 4 of those to couple of $5K servers and you have a $30K overhead on 3-4PB HDD storage, can add NVMe in those 2 servers for Metadata & Cache or use 2U 2.5" 60-72 bay JBOFs if you are after performance. this is probably 50-25% the raw HW cost of those HCI/Scale-out storage (Coho, Cohesity, Rubric, ..) with 2-4 servers in a 2U box with only 12-24 drives. we can go over the Excel offline, i did it so many times.

Fact #3, File and object doesn't have to be slow, MS can demo more than 1M IOPs FS, iguazio does >2M IOPs file access and 700K IOPs S3 (Chris saw that 1st hand), traditional all flash NAS guys are <200K IOPs and traditional S3 is ~10K IOPs but its an implementation issue. dont know how many block storage providers do 2M IOPs on 1:1 client:server.

Fact #4, Databases have evolved quite a bit, no more fixed schema rather dynamic + column compression, more use of Flash and tiering at the DB level, you will soon see others adopting our approach of using NVRAM for DB Journals, hot-metadata & Cache, how do you snapshot something that store data in 3-4 independent tiers ? how can you compress/dedup data that is already compressed at the DB level? Traditional DBs dont scale across more than few nodes simply since they work in lock-steps with the SAN, but new DBs from ALL vendors use DAS, distribute table spaces and coordination at the DB transaction level which is the only way to scale, yes you can shout as much as you want but seem to miss a lot of background.

Fact #5, Vertically integrated Databases are faster than over SAN or HCI, Checkout Exadata, Vertica, AWS Aurora as example, or iguazio . 1. with HCI you have two layers of coordination and network chatter 2. new Databases organize data in disk optimal fashion, striping and other storage smarts just degrade perf. yes, if you use MySQL which has a miserable architecture (real all rows/cols in a range for every query) it wont matter much, perf will always suck, just give it some flash so updates and indexes wont kill you.

MySQL will give you 10Ks TPs even if you give it a 1M IOPs flash, OTOH iguazio will get you 2M TPs (100x faster) because we manage the transaction all the way from the incoming RPC to the low-level memory page or sector w/o any Page-Cache/FS/Array serialization.

Yes, i agree orgs have many "Pets" and cant transition so fast, but they do it in steps e.g. move files from NAS to object, move to consume DB as a service with ODBC/JDBC, etc. some use consultants that do "App Modernization" (BTW not sure you heard but today most clouds offer non-disruptive live data migration for your DBs & Apps).

on the same time they build new strategic Apps that focus on Cloud and Mobile First, continuous customer engagement, real-time analytics. they do it because they are so afraid a startup like Uber or pay-pal or their competitor will disrupt their business. for the new Apps they need agility and elasticity, most IT shops cant handle those new "Cattle" Apps at the agility they need so some do it in the cloud.

would love to give you a deep dive 1:1 to show you why some of the myths you got used to are just myths.

Yaron

Yaron Haviv

Re: Storage collapsing into services

Oh, right another one of those IT guys that thinks Cloud would just go away, and we can keep messing with FC SANs..

IT shops are migrating to cloud, u can put your head in the sand as much as you want, and Cloud doesnt have SAN. For guys like you they invented "migration services" to Seamlessly move your Oracle or VMware. guess what, those tools you are so fond of like Exchange, MS Sql, .. In Azure don't run on any SAN or HCI, and guess what they have the same or better HA, Global DR, and consist snapshots don't require any coordination w storage

Some just have hard time accepting the IT world would never be the same. The writing and financial report are already on the wall.

Yes, like Mainframes legacy won't just go away, it will stay for ever, but not sure you can grow a company based on it.

Yaron

Yaron Haviv

Storage collapsing into services

Take cloud, is anyone there using SAN? even HCI/vSAN note really used

in the modern stack storage collapse into the data services (object, database, streaming, ..), connect directly using DAS, and backed up into object. so the overall SAN/NAS/HCI storage pie is shrinking, within it you may see one eating the other e.g. HCI or NVMEoF steeling from arrays.

why build a box with CPUs that all its use in life is to take blocks from the wire and organize on disk if it can do something more useful like process a protocol, search, index, etc. and instead of trying to figure out caching or compression at 4KB granularity it can do it once at the application, also guess what no need for 3 layers of journaling that kills perf. storage is not a service, rather infrastructure that serve a service/app, and now those new services/apps are consolidating with the disks/flash, no more room for blocks over a wire regardless if you call it SAN/HCI/NVMEoF.

Yaron

EMC crying two SAN breakup tears

Yaron Haviv

Re: Whats new?

Nate, check out Amazon aurora, MySQL clone running over tiered object, Its 6x faster and scale linearly

Also look at Oracle Exadata and MS Sql over SMB 3.0, Google Spanner, way faster and more scalable than Block based DBs

Block based DBs are outdated, do full scans, locks,.. It doesn't scale and doesn't benefit from flash random access attributes

Yaron

Yaron Haviv

Whats new?

Not sure where is the novelty?

Many vendors today support flash with backend object tier

exposing it as SAN is like riding a horse in a highway, why would you do that when the world is moving to consume data using shared access patterns like file, object, records and streams, with scale-out built in to the data/app layer

Seems like Enterprise storage vendors still haven't internalized the cloud era

Yaron

Iguazio: Made from Kia parts but faster than a Ferrari with 1,000 drivers

Yaron Haviv

BigData! = Hadoop..

I assume some people immidiate reaction to BigData is thinking Map Reduce or Hadoop flavors, it goes well beyond and growing like crazy

Any modern service today makes heavy use of data and analytics, if its Uber, your Bank, Google which poke around your Gmail or search history, insurance company, or enterprise device data aggregated in Splunk ..

In all those places Arrays, HCI, and even NAS are quite irrelevant tech

Yaron

File this: XtremIO to fling forth filer functionality

Yaron Haviv

Better match

Exanet architecture of HA pairs is a good match w XtremIO (the same)

One of XtremIO founders was sys architect of Exanet, so its probably not a coincidence :)

Yaron

IO, IO, it's profiling we do: Nimble architect talks flash storage tests

Yaron Haviv

Re: But Nimble's data is mostly SMB customers

This is exactly why block is not going to keep up with new DB design

The right way is to put the DB log and Indexes in NVRAM (no disk IO), and encode data in a way that avoid full scans or long tree traversal, time to leverage random nature of NVMe flash, most databases are designed around decades old HDD limitations, and avoid random at all cost with inflated memory cache

Even newer things like hbase, Cassandra, googlefs, aws dynamodb all copied the same LSM research paper thats only relevant to HDDs, which is why they can't make optimal use of SSDs

So instead of reverse engineering io patterns of DBs its time to build DBs that run on Flash and NVRAM natively, do caching, compression, dedup in the data layer, build metadata memory hierarchy for optimal search..

If DBs will be fast, we can build searchable file systems over those DBs, vs DBs over file systems

Watch for iguazio announcement tomorrow

NetApp facelift: FAS hardware refresh and a little nip ONTAP

Yaron Haviv

FAS perf?

300K IOPs, thats less than a single NVMe device

Any IOPs or BW numbers for the new FAS boxes?

Yaron

Seventeen hopefuls fight for the NVMe Fabric array crown

Yaron Haviv

Re: NVMe right, but block fabrics less relevant in the new stack

yes you are right, lets look at Oracle, MS/My SQL

they dont use Block, Oracle Exadata is using RDS over IB and Smart Scans

MS SQL is best using their SMB3 distributed file system over RDMA

My SQL is extremely primitive and slow, doing full scan on every query, no column compression, i suggest you look at AWS fork of it (Aurora) does 6x performance and linear scaling, and is using Object as back-end NOT block, or Google internal Spanner SQL DB using yet another distributed file/object

i agree that many of the open NoSQL DBs suc.., extremely serialized, eventually consistent, no security .. many came from university grads, but new ones are coming and they all understand that block doesn't allow scale (distributed locks, journals, ..) and cannot be really shared

specifically if you look at what we did @iguazio, our multi-model DB engine is faster than the fastest block storage around (2.5M ops/sec on 1 server @100us w 99% percentile), our AWS compatible Web APIs deliver 800K HTTP req/sec w a single client/server, its way faster than your fastest ALL FLASH NAS and at fraction of the cost, only way to do it is if you cut through the stack and keep CPUs & mem 100% busy, not by using Linux poor page cache or CPU locks or blocking threads all around the OS stack.

yes, legacy wont go away anytime soon, but it will shrink

not because IT dinosaurs want it to, simply because new companies, developers, and Biz owners understand Cloud and PaaS brings agility and efficiency, and they want to build modern digital apps like ebay, uber, and Netflix, no one wants to maintain Exchange, they rather use Office 365 and do more interesting things w their life.

Yaron Haviv

NVMe right, but block fabrics less relevant in the new stack

Yes, NVMe is replacing SATA, new gen with reduced write IOPs will be priced at 30-40cents/GB, not the few $ they used to just couple of years ago

BUT having a SAN, any SAN, regardless if it using SCSI (iSCSI/FCP) or NVMeF is not so much the future, just watch AWS or Google, the future is in object, files, and new scale-out databases

pure got it, introduced an All Flash object, DSSD has K/V

next week watch iguazio announcement, a preview, its 2.5M 4KB file, DB, or Object ops/sec per node and under 100us latency, by integrating the DB engine with NVMe & NV-RAM in a way that bypass the entire stack, its architecture is so efficient that it is priced significantly lower than any of the vendors today even AWS.

the layered approach of building databases over file over block/SAN and dedup/compression/caching at the block layer is good for legacy stuff, but than you end up with All flash NAS that can hardly scratch the 200K IOPs and databases that do 20K ops/sec, why compress data in 4KB sectors when i can do it in the DB layer and search encoded/compressed data vs be forced to decompress data pollute memory w garbage and scan the entire data ?

yep, right, it is hard to make this mental change from managing sectors to managing data, thats why no enterprise vendor can go after AWS or Google, they are stuck thinking infrastructure layers, not apps or services.

Yaron

Wikibon sticking to server SAN takeover idea

Yaron Haviv

Re: Totally ignoring the real trends

if you install C*, Mongo, and many modern apps the clustering is in the app layer and that can NOT be avoided, since cache, metadata and indexes should be also in sync, having HA at the disk layer doesn't eliminate app clustering just adds complexity. modern apps have more search efficient compression in the DB layer, making block dedup/compression useless..

right you dont want many independent "stateful" stacks, why its time to converge and modernize them, or move the Data API to a layer that doesn't duplicate clustering in both the app layer and the storage layer. the real Data-silos are in the DATA layer, the key to decipher the blocks is not in the SAN but in the apps, blocks are dumb, having blocks from different apps in the same pool doesn't eliminate silos, it creates them, forcing all data access through a specific VM which has the "key"

the report talks about hyper-scale using HCI, that is misleading, the biggest repos they have use DAS and clustering in the app layer for the reasons i outlined, can read more details in: https://sdsblog.com/cloud-data-services/

Yaron

Yaron Haviv

Totally ignoring the real trends

This view is assuming vSAN (block storage) continue to dominate the market

when in fact file, object, NoSQL, BigData, .. are gaining faster momentum, and are the dominant in the public cloud

Hyperscale guys dont use SAN or HCI for their Cloud Data Services, but pure DAS with clustering at the App/Data layer, i also dont see why someone building an Splunk, HDFS or Cassandra cluster or distributed File/Object would want to put it on top of vSAN/Nutanix (unless they want to really kill their performance or add significant costs)

time to wake-up, the infrastructure world is changing dramatically, and is driven by public cloud quest for efficiency w/o legacy BS

Yaron

Behold this golden era of storage startups. It's coming to an end

Yaron Haviv

Moving up the stack is next!

Right, it is enough to look at the public cloud trend, block storage adds complexity, and doesn't solve the fundamental app perf problems, the new front is data APIs and management

We have Mega IOPs storage, with Databases doing 10k iops and 10s of ms latency

The next wave is collapsing the stack, serving high level APIs at the speeds of NVMe and NVRAM by stripping out redundant layers, chatter and serialization, moving data search and vector processing to the storage, securing/monitoring data not LUNs, managing it all as a PaaS

That is a much harder problem to solve, fewer startups and some cloud giants, but stay tuned for iguazio announcement & demo

Yaron

Enterprise storage is a stagnant – and slightly smelly – pond

Yaron Haviv

Nice, don't forget impact of DAS

Chris,

Nice work

Beyond file/block SDS many move from traditional file and block to modern apps w Hadoop, Cassandra, Mongo, Elastic, Splunk, Vertica .. All those new DBs with resiliency built-in, w/o the need for (expensive) external Raid, some store Petabytes in those and I'm sure it has even more impact on the charts than HCI

Yaron

Is VMware starting to mean 'legacy'? Down and out in Las Vegas

Yaron Haviv

Nice post, APIs will win

Enrico, nice post, i share your views

Its not about containers, rather moving to an API economy, if you use AWS or Azure you use some Apis to store/query data, APIs to run functions, APIs for AI, APIs to route traffic, .. No infrastructure setup will be involved as they go up the services stack

If IT and VMware won't internalize it, the developers and biz owners will bypass them a swipe the credit card at AWS (just like w Salesforce, cut a deal w AWS), to gain access to most modern services w/o the deployment hassle

Yaron

Nutanix bought PernixData to slurp caching firm's IP brains

Yaron Haviv

Fat nodes

Nutanix has a fat 2U node, the NX-8150, if you don't mind paying >$100k/node (according to the web)

Yaron

Hasta la vista Lustre, so long Spectrum Scale: Everyday HPC is here

Yaron Haviv

Parallel FS is about large shared name space

Seems like we are confusing some terms, Lustre and GPFS address the need for a huge name space shared by thousands of clients w/o mutual exclusion (like in NFS)

They are pretty old and have major drawbacks like slow on Metadata/small ops, FS over FS.. , but are way faster than NFS on bandwidth (use RDMA and fast stacks vs highly serialized NFS stack)

The problems of inefficient caching and io scheduling/serialization has more to do with the local storage stack than the distributed FS part, and indeed there is a good question if there is room for client side caching when the storage becomes low latency, but it requires changing (or bypassing) Linux

In any case not sure how Pure can do without having its own sort of parallel FS to deal with the requiered Metadata operations, or scale beyond one array, or is it just a matter of semantics?

Yaron

Intel and pals chuck money at another Fibre Channel killer

Yaron Haviv

Great solution for an old problem!

NVMeF is a great replacement for a SAN or SAS Jbod, good for the legacy stack using a local (unshared) file system

While most of the world data is moving to shares file, object, NoSQL/NewSQL types of solutions that are based on DAS or OSDs (simply because data and Metadata must be updated simultaneously, with SAN it means shared locks and journals and that just doesn't scale, lots or writeups from aws, Google, Azure, .. About it)

Even VM images which typically use local FS are now 1% of their size with Docker

I think the storage camp need to spend more time with the apps camp, and focus on the right problems to solve, e.g. Add k/v notion to those NVMeFs would make them way more useful

BTW i imlemented the first NVMeF prototypes, before it even had a name, thought its the best thing since slice bread, but the world around us has changed since, and we need to adapt the infrastructure to the app stack

Yaron

All right, pet? Getting owlish about Hedvig

Yaron Haviv

Re: Nice marketing, but ..

your answer somewhat contradicts the architecture your team presented at TFD10:

https://vimeo.com/168710028

you can see how the FS/Object "proxy" is placed in a VM and mounted on your distributed "virtual disk" (i.e. limited iops, metadata ops & space per namespace), all references to "metadata" are in the context of vDisk, not file/object metadata.

wont make much sense to place a distributed file/object layer in those Proxies over a distributed block layer, that means doing clustering traffic and consistency twice, not to mention the proxies are stateless so every file/object IO commit need to issue multiple block layer IOs over the network (for journaling, data, and metadata ..).

this design is cool for Block/VMware focused customers, small NAS or Object deployment or many such small file volumes, but is not a scale-out file solution, those are much harder to build.

BTW re the "Hyper-Scale" slogan, non of those guys use a distributed "virtual disk" approach, they all use a distributed Object/file/NoSQL over DAS for the reasons i outlined, just like Cassandra :)

we can continue the discussion offline if you want

Yaron

Yaron Haviv

Re: Nice marketing, but ..

not sure whats the diff between "user" and "app" data, thought there is one definitions for a scale-out FS

it boils down to simple questions:

Do you process the (heavy) NFS protocol and Metadata ops in a single node or its fully distributes ?

same for object protocols like S3, is metadata and protocol handling fully sharded ?

if you get an object query to find the ones with attr1=X & attr2=Y how many nodes will participate in looking up the matching objects ? 1, 2, fully distributed ? is the Metadata & search co-located with data ?

IMO doesn't make sense to run object like S3 on a virtual/scale-out block storage since sharding and transactions are in object boundaries and not sector boundaries, why all native object solutions use DAS

you may have a wonderful product, and indeed the DR feature is a differentiation, but i think storage vendors should exercise transparency, having an object API option doesn't make one a true object storage

Yaron

Yaron Haviv

Nice marketing, but ..

Today its hard to tell with all the Marketing spins

but AFAIK Hedvig is not a scale-out NAS or object solution rather a scale-out Block storage, no different than EMC ScaleIO with standard Linux FS installed on that LUN

Correct me if i'm wrong, what they do is implement (a single) NAS or Object protocol head over a distributed virtual block device (LUN), this means the FS or Object layer doesn't scale, and there is no (distributed) Metadata awareness, or built-in metadata search capabilities, which are very fundamental for modern object or scale-out NAS solutions

Yaron

Contain yourself – StorageOS is coming

Yaron Haviv

FC SAN and Containers? Give me a break..

I think people that put SAN and Containers in the same sentence dont understand the concept of Micro-Services and continuous integration

You got legacy apps, keep it in a vm, if you want to move to an agile and elastic world of cloud-native apps adopt a cloud like practice, do you know of any SaaS offering using SAN? Any NoSQL stack requiring SAN?..

Can read my post for in depth explanation https://sdsblog.com/cloud-native-will-shake-up-enterprise-storage/

Yaron

Big Data upstart Iguaz unveils furiously complex exercise to remove complexity

Yaron Haviv

Re: More details

Adam,

If you read the details, only the protocol/api layer is stateless (i.e. Server side nvm cache to enable full consistency, concurrency and elasticity)

You are right, if store the data as files you cannot accelerate performance too much, nor can you support streaming or k/v, as i stated we go higher level into the data, definitely not doing full-scan the way you described, wouldn't be much of a challenge :)

See my blog post on data "re-structuring"

can contact me 1:1 for more tech details, register at our site for a demo, or watch my TheCUBE interview:

https://youtu.be/KyEqZ4oQw9M

Yaron

Yaron Haviv

More details

Chris, thanks for the post

Note that we are already in early customer deployments, can demo claimed performance and functionality today to relevant customers, far from slideware

PD and Actifio address the SAN/NAS space, have no data awareness, while iguaz.io is higher level in the stack with data and record level insight, Analytics offloads, ACID semantics, etc. And address the new app space, Suggest to read http://iguaz.io/technology

BTW doing all that at faster IOPs than the fastest AFA in the market today (and can demo it)

Yaron

Iguaz.io CTO and Founder

One (storage) protocol to rule them all?

Yaron Haviv

Everything is an Object

why not view everything as an object. Blocks, files, ..

today in storage we have a narrow vision of what Objects are (e.g. cannot be modified, need to be served via HTTP, ..) if we think of Objects like programmers and add the notions of properties, methods, & events and flexible yet fast Ethernet based protocol you would be able to map block sectors, files, records, or any other form of data to such "data objects" and without compromising on performance

My 2c

Yaron

SDSBlog.com

Whip out your blades: All-flash Isilon scale-out bruiser coming

Yaron Haviv

Re: Flash but low on IOPs

After number games we had Chad updated his blog post to mention a Node "may" be composed of multiple Blades as i guessed, i bet its 4 blades per node (each 3.75GB/s) this makes the numbers work out

Yaron

Yaron Haviv

Flash but low on IOPs

Good observations, things in Chad's post don't add up, posted some on that as well

The may have mixed the terms Blade & Node and have few blades per node or something

re the IOPs number they mention 250K IOPs, the S210 model from 2014 did (on paper) ~100K IOPs, so growing only ~2x in ~2 years means they ride Moor law, and cant really make the 10 year old OneFS a Flash optimized storage. same for latency, 1ms latency is better than the previous model, but still much slower than the new Flash and NVM media can offer (1-100us)

wrote about whats needed in to get to the new levels of performance in File & Object: http://sdsblog.com/wanted-a-faster-storage-stack/

we will soon see vendors delivering >1M IOPs and bare-metal latency on file and higher level APIs/features, so not sure i buy Chad's description of it as "Extreme Performance" for a 2017 product.

Yaron

High performance object storage: Not just about reskinning Amazon's S3

Yaron Haviv

Re: Not so fast ;)

Object can be accessed by different apis, not just S3/Http

E.G. Ceph has librados Api as you noted and one can access/modify data in variable offsets, to some extent key/value systems like Redis or Aerospike are kind of object storage (use DHT) with more structured data and focus on performance (of 1M iops) vs capacity

The key benefit of object is avoiding the Metadata scaling challanges, and the overhead in maintaining directories when the key is random by nature (e.g. Picture id in a web page, user record, .. ).

In systems with millions and billions of files you want to use atomic get/put and read/write the entire file or record vs use nested directory lookup -> open -> read/write/lock -> close.

Yaron

SwiftStack CPO: 'If you take a filesystem and bolt on an object API'... it's upside down

Yaron Haviv

Re: The wheel turns...

People also used to have email sub-folders, and as the amount of emails piled up, just use the search bar

Same with Object, you use the Metadata fields to find what you are looking for, and can have more context to work with

Object storage doesn't need a full file system, can do with hash functions and has less dependencies between data structures

Yaron

Hybrid cloud thingies, new media and everything is software-defined: Storage reinvents itself

Yaron Haviv

Re: Software defined this, software defined that...

"Software Defined" originate from the idea you can use software abstractions (i.e. REST calls) to define infrastructure and services, SDN started with OpenFlow which allowed to bypass the rigid Cisco CLI based ACLs with something much more flexible and it controlled HARDWARE OR SOFTWARE, it evolved from there, many of the network services are now implemented in software to provide better flexibility.

the issue is that many guys in storage saw that as a marketing opportunity to re-brand their rigid/legacy products, and hijacked the term to mean "Software Implemented", useless term in days when ALL storage products run with software over x86.

the real software defined is AWS, which provide pretty high-level abstractions and automation on top of their infrastructure and services, in a way you dont even know or care how things are implemented. Check out the UI for most of the "Software-Defined" products, why is there so much data about low-level/infrastructure stuff and hardly any self-service abstractions ?

Yaron

SDSBlog.com

Private cloud: Strategy and tactics from the big boys

Yaron Haviv

Many will skip to containers

Enrico,

Many i talk to plan on skipping the OpenStack phase directly into micro-services and cloud-native architectures, i.e. DevOps with Mesos/Kubernetes, Dockers, ..

can see: http://www.americanbanker.com/news/bank-technology/why-tech-savvy-banks-are-gung-ho-about-container-software-1078145-1.html

OpenStack is still a IaaS cloud, not too different than VMware, just much harder to deploy, you still mess with vNICs, vDisks, .. i.e. infrastructure, people want to run elastic services, generate images from Git, and not care about a VM going down. They use VMware today for a robust IaaS and play with Docker to gain the real agility and change how they treat app lifecycle. VMware and OpenStack now run after containers to fill the gaps before someone else will.

BTW for a real AWS/GCP/Azure like cloud you need all the Platform & Data Services (can see my last blog post on that), unfortunately there are no such Enterprise alternatives yet.

Yaron

SDSBlog.com

Tiering up: Our man struggles to make sense of the storage landscape

Yaron Haviv

ignoring the Dark Horse ?

Chris,

i believe the post ignores the Dark Horse in storage, the fastest growth for cloud vendors is not in Object Storage, rather in managed data services/lakes, part of the overall trend of moving from IaaS to PaaS & SaaS, i.e. (micro) Service Centric Architectures.

AWS storage services like Dynamo, RedShift, Aurora, Kinesis, .. are growing exponentially AFAIK, just open GCP front page (cloud.google.com), out of 14 listed services 8 are Data Services (i.e. storage), one is object, and NONE match the categories listed in the post (SAN, NAS, Hyper-Converged, ..), go visit Azure and you will see the same. its only reasonable to expect this trend will get to on-premise data centers (the ones which will survive the cloud assault).

the reason we consume lots of SAN is mainly DBs (a Data Service) and VM images, DBs move to service based consumption like the ones above (using NoSQL & DAS or Data Lakes), even Oracle figured it out and is transforming, this will be re-enforced by the growth in BI/Analytics & IoT demanding much larger and faster data services, VM images will move to stateless containers to enable workload elasticity and DevOps (see my post: http://sdsblog.com/2015/09/16/cloud-native-will-shake-up-enterprise-storage/)

yes, there will be lots of legacy, people still buy FICON and mainframes, but in few years the storage landscape will be very different, the current turmoil we see is only the tip of the iceberg.

Yaron

Startup Iguaz.io is creating real-time Big Data analytics storage

Yaron Haviv

Founders

Chris,

good to see someone is reading my blog :)

for sake of accuracy, 3rd Founder is Yaron Segev, ex Founder & VP R&D/COO of EMC XtremeIO (#1 AFA, selling at > $1B/Year)

Thanks, Yaron

Shares tumble at flash-disk array maker Nimble: Time to crack open the all-flash?

Yaron Haviv

need innovation to survive a fixed market

SAN is not a growing market, the storage growth is in Hyper-converged, BigData, Public Clouds, Object, Data-aware, etc. All-flash, Hybrid, Legacy SAN all fight for the same $$, and customers are not willing to pay the same margins so its even less $$.

to stand out in such a market you need major differentiation i.e. a 5-10x better at something, between all the SAN vendors its a wash, slightly better RAID protection, some better UI/Config tools, a better feature here or there, 30% faster perf .. how much can you re-invent in an array?

Nimble started with some pretty cool and differentiated ideas like cloud based support, app profile optimization (which for some reason is now toned down in the materials, maybe it didn't work?), reduced cost and decent perf with Hybrid, Etc. but everyone copy their InfoSight, and with those small arrays (<100TB) the price difference between Flash and HDD becomes meaningless, especially as SSD is 5x cheaper and denser than few years ago when they started.

So they invest in "brute-force" sales (loaners, dinners, leasing gear, .. ), not innovation and radical new features, which is why they loose so much money, not sure if there is a way out of it, and i believe its going to rain on some of the other players in this category just like it does on EMC and IBM, simply a matter of time.

so back to basic, either be in a new fast growing market, or do things 10x better, best if you can do both :)

Yaron

SDSBlog.com

Networking ace reveals: Intel planned NVMe for XPoint

Yaron Haviv

Xpoint software challenges

can see my blog post on current Linux storage stack arch, and challenges with fully utilizing NVMe & 3D Xpoint benefits

http://sdsblog.com/2015/11/19/wanted-a-faster-storage-stack/

Yaron

Block storage is dead, says ex-HP and Supermicro data bigwig

Yaron Haviv

Re: Object can be faster, but UDP?

Caitlin, We haven't talked for years :)

FCoE never got to work in scale with multi-hop networks, cant point to many customers doing it

Cisco main push was on a single switch hop (only to the rack switch)

As you know I built a bunch of IB & Eth clusters with thousands of nodes, Eth & IP are not well designed for lossless, Pause is not Credits and it requires careful/E2E configuration of PFC and Switches.

packets do drop due to HOL blocking and require re-transmit, if a bunch of guys send lots of packets to a destination, you will surely hit a congestion at the switch egress, new switches dont have enough buffers to hold it, and doing pause will propagate congestion through the network.

we spent a lot of time adding capabilities to RoCE NICs and make it more robust at cloud scale and there is more to do, it will require doing a pretty complicated layer over UDP, so why not just stick to TCP for software or RoCE for Hardware accelerated

as you know from your TOE experience, TCP and DC/TCP can be fast, issues are actually more about DDP (DMA), allowing storage data to be gathered/scattered from/to app buffers w/o a copy, doing header/data split .., just like SCSI, or FC, or NVMe or RDMA do, and another critical challenge is how to deliver the notifications and doorbells to/from the app CPU core/thread to avoid locking (something NVMe & RDMA do well), so should we re-invent it all now for UDP?

Yaron

SDSBlog.com

Yaron Haviv

Object can be faster, but UDP?

i agree with most of the points

yes, Object can be faster than block and is the future, most block vendors use some form of versioned b-tree, object or K/V use faster Hash, the current implementations of object over file are not really efficient since they double the overhead and add a slow HTTP protocol in front.

yes, we need metadata, it can be coded in yet another K/V set, K/V is not an alternative to Object which have indexed and extensible metadata, security, management, tiering, EC/RAID/DR, .. but rather the best underline tech to store the object chunks.

Note that a big advantage of K/V is eliminating the "double fragmentation" in Flash hardware, i wish some of the NVMe or NVMe over Fabric (Eth) guys will extend their APIs to K/V. many new age apps are already using K/V (in the form of RocksDB or alike) since its faster to develop with and leaves the hard problem to someone else. having hardware K/V is the natural evolution.

re UDP, don't think it works at scale, Coraid pioneered similar notion and are now closed, you must have ways to deal with network congestion, and have mechanisms like TCP congestion win .., or RDMA credits and congestion avoidance.

a key problem with Kinetic is the CPU overhead on the client side, lets assume i access many drives or flash it would eat up all my CPU, vs a SAS HBA or NVMe which do 6GB/s all in hardware.

for that reason i think NVMe over fabric (RDMA) or Intel Omni-path will be more efficient when it comes to remote Flash or remote K/V.

Yaron

SDSBlog.com

OASIS: Refreshment for dehydrated secondary storage users?

Yaron Haviv

4 servers to manage 12 drives?

not sure why would it make sense to have so much CPU & Mem power for only 12 HDDs in secondary storage, sounds like an overkill

in an age when object storage vendors support few Petabytes in a rack behind few server nodes and talk cents, it sounds pretty inefficient to have a >$1/GB price label on a backup system, even with all the fancy features. there still seems to be quite a big dissonance between Enterprise storage products, and how it is done in the cloud, no wonder why companies like EMC & IBM suffer decline in revenue

many vendors try to push the notion that hyper-scale storage is designed like hyper-converged, that was sometime ago, only way Cloud vendors can now sell $0.01/GB/mo is by using much more efficient and dense architectures

Yaron

SDSBlog.com

Overcoming objections: Objects in storage the object of the exercise

Yaron Haviv

from https://aws.amazon.com/s3/details/

"Amazon S3 is designed for 99.999999999% durability" (i.e. every put has 11 9s durability)

again, in implementations which support offset (Scality, Ceph, ..) you partial write example is incorrect

few year old Beta level Ceph benchmarks are not a good measure, see more recent ones by Sandisk:

http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2015/20150813_S303E_Roy.pdf

even the one you sent doesn't show a knockout on IOPs, just slightly better

Jeff, please don't take it to the personal level, as the Gluster architect you are not clean from bias

i suggest to continue this discussion offline

Yaron

Yaron Haviv

Re: Not so fast

seems like a religious discussion, so i will stop with this post

lest stick to facts: ALL Object even S3 provides Atomicity & Durability as base attributes (with better guarantees than NAS), most object like S3 are Eventually Consistent, and some are Fully Consistent (implementation dependent).

Would be happy if you can point me to a benchmark to back your thesis which can shows Gluster significantly knocks out Ceph or Coho Data on IOPs, all the ones i saw show they are on par or better, not fare to pick on a cloud archiving product like S3 to make perf claims.

lets see what the future holds, seems like object is penetrating more use cases in which NAS was the king, and is getting better, trend backed by IDC figures, you may propose its a temporary trend, seems less likely to me

Yaron

SDSBlog.com

Yaron Haviv

Re: Not so fast

i agree with all your comments when you relate to legacy object (S3, Swift, Amplidate, cleversafe, ..)

the other ones i mentions do support mutable objects (i.e. no need for get/modify/put or chunk files), some have consistency and other features to simplify NAS gateways.

CephFS is a poor way to implement FS over object, knowing the details i wouldn't use that as an example, But Coho data seem to have nice NFS perf and scale. Even if Ceph/Gluster are +/- on par re perf, it somewhat contradict you theory that file should be way faster

I assume you know the stats re % of (Slow & Blocking) Metadata ops in NFS (NFS 4 added compound to help, but unfortunately no one use it), in real world it is pretty hard to get NFS to perform (and i personally did many of those benchmarks), if you disable the client cache or sync() on every IO to be on par with object atomicity/durability (required for micro-services) its even worse. the exponential growth in data now add more dependency on Object Metadata related indexing and operations, those are not possible in NFS.

We know web-scale moved from NFS to Object (e.g. Facebook haystack), but now even heavy users/proponents of POSIX like the DoE/DoD work to relax the POSIX dependency and have government funded object projects.

Anyway you are right about the limitations some of those products have which limit object usability, its time for new vendors to come with better solutions.

Yaron

SDSBlog.com

Yaron Haviv

Re: Not so fast

Platypus,

Never say Never :)

Yes, object today is mostly slow but that is due to the vendor implementation/focus on archive and HDDs, not an architectural limitation

Object is not tight to http, scality, ceph, and Coho data have native tcp api's, and from benchmarks i did Ceph was actually faster then Gluster, in my prev job my team did rdma transport to Ceph (now upstream) which have better bw and lower cpu compared to NFS

If you add Object concurrent apis (do one get vs nfs open, lookup, get-attr, read, close), when using small files object can be way faster and challenge the slow and non scalable metadata access of file and NFS

I expect emerging object solutions will put more emphasis on performance and consistency

Yaron

We can give servers more memory, claims Diablo. Well, sort of

Yaron Haviv

Seems like quite an expensive NAND

Quick math its $5/GB, you only get max of 1TB NAND per server

And proprietary/non-standard stack

This is when NVMe drives are gowing below $2/GB, can hold more than 4TB per drive, and come with standard stack and a growing eco system

Diablo need to come with innovation on how to combine dram and nand, and lower the cost quite a bit to make it interesting

Yaron

Server storage slips on robes, grabs scythe, stalks legacy SANs

Yaron Haviv

SAN/vSAN Centric, ignores reality

This report is too SAN/VSAN Centric, while I agree the current SAN model will shrink, vSAN is not the main alternative.

vSAN is used for co-located storage (VM images, app data, ..) and with new technologies like dedup, containers (Docker image is only 200MB vs 10GB vDisk), relative amount of server co-located storage and block storage will go down.

The real hyper storage growth is in shared unstructured data, i.e. IoT, Video streams, Logs, BigData, .. such storage is not using any SAN/VSAN protocol, and cannot be co-located with app cluster (vSAN), simply since: a. any app in different compute cluster/region or even mobile device may want to access it, b. it grows in rate >100% per year and adding servers/cpu/mem for the sake of adding Petabytes is not so economical nor dense or power efficient enough.

Its enough to look at the hyper-scale titans, which don’t grow their vSAN significantly as the post/report may imply, but rather grow exponentially and invest most of their energy in shared data-lakes and data services supporting object, scale-out NAS, and No-SQL models.

If hyper-scale trends are an indication to where the Enterprise will go, Enterprise data-lakes, next gen object storage, and scale-out NAS will probably store more data than vSANs (hosting small Docker images and private app data).

Yaron

SDSBlog.com

Page: