Working with 60 million files pushes the boundaries of any storage. Windows underpins most of my storage and so the theoretical and practical limitations of NTFS and Distributed File System Replication (DFSR), and the difference between theoretical and practical limits on the number and size of files they handle, are important …
Virtual drives can be a useful workaround
I use ImDisk quite a lot for this reason. Anything that involves lots of small files tends to be directed to one of a few virtual hard disks. There's costs, and it won't work for everyone, but it solves certain problems quite well. In particular, it's OK for a desktop system with a lot of files - but I can't see it being useful on a server. Sure, servers are often virtual, but that's a different thing.
Before I started doing this, deleting an old set of Doxygen files for a large app/library might take several minutes. Now, I just unmount the virtual drive and delete the image file - a batch file can do that in a fraction of a second.
Also, I mostly don't have any of these virtual drives mounted - so no RAM wasted on caching something I'm not using, MFTs etc included. It's *very* rare that I have more than one mounted at a time.
References, metrics, best practice?
Uh, can we have some references for the assumptions made in this article? I think your confusing cache hits with loading the entire MFT into RAM. I've never heard of this being an issue before so I had a search in Google and the only place I can find it mentioned is... this article.
ZFS has some interesting limitations
I nearly had a career limiting event caused by an unkown (to me) ZFS limitation recently.
ZFS performance drops off considerably when using snapshots AND your used space is greater than 50% of the total pool capacity.
So when / If you buy a storage product using ZFS as the file system, don't listen to the 'Experts', buy big much bigger than you need or it will bite you.
Anon - We settled and I promised not to bad mouth them.
ZFS performance drops off when using snapshots. Any snapshot-capable file system (or LVM for that matter) has that problem, especially if you are mounting your snapshots elsewhere, and especially if you are using them as writeable. The reason for this is that for every snapshot you have, you have to write additional undo-logs for every FS write. Performance degradation with snapshots is linear with the number of snapshots you have.
Yes I think, and I say it anyway
"Do people stop to think before blurting out "use Linux""
Yes I stop to think. And I say it anyway, because it doesn't have these ridiculous limits. I don't have any filesystems with 60,000,000 files, but I do have one with 1.4 million, the machine has 448MB of effective RAM (512, and it's pulling 64MB of that for video RAM.) I can assure you it doesn't use 100s of MB of RAM to keep track of those files, and there's no speed problems accessing files. I'm using ext3.
From a design standpoint, ext2/ext3/ext4 uses inodes and trees, there's not some huge bitmap that has to be crammed into RAM. From a practical standpoint:
Using some custom-built disk array Redhat had, they tested a few filesystems with 1 billion files. (1000 directories with 1 million files apiece.) With ext4, mkfs took 4 hours, it took 4 days to make 1 billion files (ext3 made files about 10x faster..), fsck'ing the filesystem with 1 billion files took 2.5 hours, and 10GB of RAM. They are now working on a patch to fsck to hugely cut RAM usage, it turns out it's nothing inherent in the fsck using that much RAM, fsck just hasn't been optimized to reduce RAM usage as yet. The actual usage of the filesystem did not use abnormal amounts of RAM.
Here's someone that had (at the time) 113 million files on an ext3 filesystem in a box with 4GB of RAM. No sweat in normal operation -- they had trouble with fsck though because they had a 32-bit kernel and fsck wanted to use >2GB of RAM. But they did have 2 ways around it (first, run a 64-bit kernel, fsck in fact only needed a few hundred MB more RAM. Second, fsck has an option (that I didn't know about 8-) ) to write temp files out to a filesystem (obviously not the one being checked..) instead of keeping stuff in RAM. Apparently this is not accessed too randomly so it only slows the check by about 25% even using a regular hard disk.)
Sorry, but if NTFS really needs 1GB of RAM per million files (more or less)... well, wow. Just wow.
@Henry Wertz 1
"Sorry, but if NTFS really needs 1GB of RAM per million files (more or less)... well, wow. Just wow."
NTFS does not *need* 1GB of RAM per million files. As I said in the article, it will work just fine without that RAM. While NTFS can of course work just fine with as little RAM as any other file system, you suffer a (fairly massive) performance penalty for not following the stated 1GB per Million files rule of thumb. The scenarios I find this to be true in are:
a) Generally access all of the files on your drive in a reasonably short timeframe. (For example a nightly backup crawl, or a large website where virtually everything will get read at least once during the course of a day.) Remember that your first access to that file will incur a hit such that your MFT is read from disk in order to find the file. Subsequent access to that file will naturally be significantly faster. (NTFS MFT records are huge!)
b) You have lots of medium-sized files files. Small files (less than 2K?) are actually stored WITHIN the MFT itself. This is optimal from an IOPS standpoint: read the MFT record and you read the file. The real advantage shines when Windows caches the MFT information into RAM; by doing so it’s also caching the data for that file! To contrast; larger files (say JPEGs or other things in the 10s of kB or higher) don’t live inside their MFT records. They have to load a huge MFT record (or more, if they are heavily fragmented,) as well as the data. The proportion of MFT/data on medium sized files is enough that you will seriously notice the speed difference of “enough RAM” versus “not enough RAM” in any scenario where you are reading the same files more than once, or in a multi-user environment. Large files (100M+) are largely immune to this unless heavily fragmented because the proportion of MFT to data is so small.
c) Random accesses of files because you have many users constantly hammering the same volume. While there are limitations based entirely on the spindles themselves, not having to send the heads flying back to the beginning fo the drive every few milliseconds to read new MFT information makes all the difference in the world. Straight-line read time for linear access may well not be all that affected by lack of RAM; modern spindles can probably feed the drive’s cache fast enough to compensate for the flying heads. Get even five users accessing large numbers of files on different areas of the drive however, and the performance penalty of these enormous MFT records becomes apparent.
d) You aren’t using flash or at least 10K SAS for your spindles. The more you utilise technologies that have stupidly low latency, the less anything I’ve talked about here matters. In fact, I’d go so far as to say that everything I’ve talked about if almost meaningless when using flash. Flash has no real seek/random I/O penalty. 10K SAS penalties are low enough that you might not notice enough of a difference to be worth putting in that RAM. If your arrays are 7.2K disks, however…you’ll notice it. The longer your seek times, the more everything I’ve talked about makes a huge difference.
Remember; take this all (and the article itself) with a grain of salt. These aren’t the whitepaper numbers or figures. They aren’t he Official Guidance from Microsoft. Microsoft will tell you NTFS can run with virtually no RAM and work just fine. They are indeed correct. You must however be prepared to accept the performance penalty that comes from merely “working” instead of “working remotely close to optimally.”
My rules of thumb for NTFS and DFSR are not the textbook answers. They are the practical ones from over a decade of experience in trying to push my hardware to the absolute max. Not because I am obsessed with getting every erg of performance out of my gear…but because I can very rarely afford new gear. I stand by them, and I would love to see someone prove or disprove them in a real production environment. One where you are hammering the arrays underlying the volumes in question with multiple random accesses 24/7. Most especially one in which almost all files are accesses more than once during the course of a day.
Oh, and one where 60M files live on several volumes located on the same array. 60M files, multiple NTFS volumes, one physical array. Transactions off that array in the 10s of TB/day. This is an extreme example, but one that shows exactly how I can arrive at the data I have. That’s *my* practical environment.
"The numbers I live by: for every million files on your *server* you should have a gigabyte of RAM "
Well, there's your problem, guv.
Windows is not a server OS.