back to article NetApp musters muscular cluster bluster for ONTAP busters

Data-vault biz NetApp has updated its ONTAP operating system to handle bigger clusters and require less downtime. It is presented as software-defined storage, although it is not typically sold separate from the NetApp controller hardware on which it runs. The company wants to tout ONTAP as an enterprise-class storage system as …

COMMENTS

This topic is closed for new posts.
  1. Lorddraco

    Aggregate up to 400TB - LUN Size?

    Post only the positive ya? wonder how big is the LUN Size and the actual sharing filesystem size.

    1. dikrek
      Gimp

      Re: Aggregate up to 400TB - LUN Size?

      Howdy, Dimitris from NetApp here (www.recoverymonkey.org).

      Max LUN size: 16TB.

      Max exported share: 20PB.

      Mr. Draco - I did check all your previous posts here on the register. Highly critical of everyone but one vendor. It's good etiquette to disclose affiliation.

      Thx

      D

  2. dikrek
    Thumb Up

    Clarification Re the cache

    Hi Chris, Dimitris from NetApp here.

    To clarify: the size of a SINGLE cache board can be up to 2TB.

    You can have several of those in a system.

    Max Flash Cache is 16TB usable (before dedupe) on a 6280 HA pair.

    Then that times 4 for an 8-controller cluster... :)

    Thx

    D

    1. Nate Amsden

      Re: Clarification Re the cache

      wondering if that flash cache can be mirrored between controllers? e.g. to avoid situation of controller going down and the other controller taking over and not having a hot cache ready to go and performance tanks. A year or two ago this wasn't possible(from what I recall) not sure about now.

      1. dikrek
        Angel

        Re: Clarification Re the cache

        There are 2 forms of Flash Caching possible within a NetApp FAS system.

        1. Flash Cache. Custom boards that slot into the controller. Upon normal failover, the cache contents are preserved and there's no need for re-warming upon fail-back. But since it's within a node, if you lose that node you lose part of your usable cache.

        2. Flash Pool. SSD-based (lives in the disk shelves). This is per disk pool and follows disk pools around if a node goes down. Never needs re-warming no matter the type of outage.

        Nate, I think #2 is what you're after. Yes we have it.

        Thx

        D

        1. Anonymous Coward
          Anonymous Coward

          Re: Clarification Re the cache

          Flash Cache - No need to rewarm on failback ? , I'm not sure that's really the question being asked.

          What happens to the content of that cache on fail-over is it immediately available on the other node via mirroring or does it require a warm up period ?

          Assuming a warm up is required then since the surviving node is now supporting twice the disk, surely it only has half the cache available so what does it dump out of it's cache to accommodate the failed over disks ?

          Sounds like Flash Pools does do this, but is it now system wide or still per aggregate and as such per node ?

        2. TheGreatDonkey

          Re: Clarification Re the cache

          Two notes -

          - #2, I believe SSD is grouped at the aggregate layer, not the disk pool layer, which can be significant depending on the size of the environment. I also believe their is/was a limit of 2x on the number of aggregates you can have with SSD Pool ?

          - I believe its a bit disingenious for NetApp to continue to advertise "non-disruptive controller upgrades" the way that they do in a comparative sense to other major vendor solutions. Netapp continues to rely on host-side configuration to cache IO during any controller failover, which creates a fair amount of nail biting each time need to go through the process (once or twice a year atypically per a unit), as hosts inherently don't have this feature for this explicit intent. The more reasonable solution most major vendors employ is a shared bus-front-end so that the host is unaware of the failover and thus does not require relying on extensive custom host tuning to support (See HP, EMC, IBM, etc).

          Netapp's solution of trying to mitigate fundamental design failures (see above, or why do I need to physically remove unit to get at battery packs which are known to expire every few years, etc?) by asking/telling customers to simply buy more NetApp's and scale out wide is just dumb. Fix the foundation, then build the master bath.

          (I use EMC, NetApp, IBM, and HP storage solutions so understand none are perfect but prefer transparency on these things)

          1. JohnMartin

            Re: Clarification Re the cache

            - Disclosure NetApp Employee -

            There is a maximum amount of Flash/SSD that can be assigned to a given controller depending on the amount of primary DRAM cahce in the controller. Within that maximum you are free to allocate SSD based flash to as many aggregates as you wish. Having said that, with 8.2 supporting very large aggregates, one or two flashpool aggregates per controller would usually be the best design choice.

            A small correction on the assertion "Netapp continues to rely on host-side configuration to cache IO during any controller failover". There is no caching done at the host, ever, either during planned, or unplanned events.

            As with any storage array serving LUNs, there are events which will cause I/O to suspend at the host, These include SAN path failures, controller failures or replacements etc, and there are host settings to address these situations (generally waiting around 30 seconds before re-issuing the command). It is certainly true that NetApp recommends that these settings are increased from these defaults to cover edge cases where a fail-over takes longer than normal due to an overloaded/poorly configured array, I would like to stress this NEVER places data at risk. While I can understand that you would feel nervous about increasing these values believing that there was uncommitted I/O cached at the server, I can emphatically state that increasing these values is safe.

            During an upgrade or maintenance event, there is an small pause in I/O during cutover, and then things proceed as before. New features such as "Volmove", and "Aggregate Relocate" along with a bunch of other things included in ONTAP 8.2 significantly reduce the time that I/O is suspended (generally well within the default timeout setting on most operating systems), but even so, upping the timeout values protects against edge events where this takes longer than normal. While continuing to minimise cut-over times is a worthwhile engineering exercise and something the Frame Array vendors already do very well, the ONTAP architecture allows you to do things that I don't believe will be possible in the traditional symmetric frame array. This is due in part because in ONTAP 8.2 all control and data elements that you need to manage (array configuration, front end interfaces, back end flexvols and LUNs) are completely independent of the underlying hardware. You can replace every single part of the underlying hardware at whatever pace / schedule suits you, and those control and data elements remain the same, which massively simplifies things like data migrations and the host level impacts of stroage array hardware upgrades Making small adjustments for HBA timeout settings is for most people a small price to pay for the added performance, flexibility and efficiency that clustered Data ONTAP gives you.

            As far as changing host defaults to accommodate longer SCSI timeout settings goes, following best practice including host adapter settings and path timeout settings is something I'd recommend for all vendors, not just NetApp, and we provide utilities such as the host utilities kit to make that as easy as possible.

  3. Kebabbert

    DTrace?

    NetApp engineers discussed porting DTrace to ONTAP, on a blog. Have DTrace been ported yet to ONTAP? ONTAP is based on FreeBSD, which already has ported DTrace, so it should be easy to do.

This topic is closed for new posts.

Other stories you might like