@Henry Wertz 1
"Sorry, but if NTFS really needs 1GB of RAM per million files (more or less)... well, wow. Just wow."
NTFS does not *need* 1GB of RAM per million files. As I said in the article, it will work just fine without that RAM. While NTFS can of course work just fine with as little RAM as any other file system, you suffer a (fairly massive) performance penalty for not following the stated 1GB per Million files rule of thumb. The scenarios I find this to be true in are:
a) Generally access all of the files on your drive in a reasonably short timeframe. (For example a nightly backup crawl, or a large website where virtually everything will get read at least once during the course of a day.) Remember that your first access to that file will incur a hit such that your MFT is read from disk in order to find the file. Subsequent access to that file will naturally be significantly faster. (NTFS MFT records are huge!)
b) You have lots of medium-sized files files. Small files (less than 2K?) are actually stored WITHIN the MFT itself. This is optimal from an IOPS standpoint: read the MFT record and you read the file. The real advantage shines when Windows caches the MFT information into RAM; by doing so it’s also caching the data for that file! To contrast; larger files (say JPEGs or other things in the 10s of kB or higher) don’t live inside their MFT records. They have to load a huge MFT record (or more, if they are heavily fragmented,) as well as the data. The proportion of MFT/data on medium sized files is enough that you will seriously notice the speed difference of “enough RAM” versus “not enough RAM” in any scenario where you are reading the same files more than once, or in a multi-user environment. Large files (100M+) are largely immune to this unless heavily fragmented because the proportion of MFT to data is so small.
c) Random accesses of files because you have many users constantly hammering the same volume. While there are limitations based entirely on the spindles themselves, not having to send the heads flying back to the beginning fo the drive every few milliseconds to read new MFT information makes all the difference in the world. Straight-line read time for linear access may well not be all that affected by lack of RAM; modern spindles can probably feed the drive’s cache fast enough to compensate for the flying heads. Get even five users accessing large numbers of files on different areas of the drive however, and the performance penalty of these enormous MFT records becomes apparent.
d) You aren’t using flash or at least 10K SAS for your spindles. The more you utilise technologies that have stupidly low latency, the less anything I’ve talked about here matters. In fact, I’d go so far as to say that everything I’ve talked about if almost meaningless when using flash. Flash has no real seek/random I/O penalty. 10K SAS penalties are low enough that you might not notice enough of a difference to be worth putting in that RAM. If your arrays are 7.2K disks, however…you’ll notice it. The longer your seek times, the more everything I’ve talked about makes a huge difference.
Remember; take this all (and the article itself) with a grain of salt. These aren’t the whitepaper numbers or figures. They aren’t he Official Guidance from Microsoft. Microsoft will tell you NTFS can run with virtually no RAM and work just fine. They are indeed correct. You must however be prepared to accept the performance penalty that comes from merely “working” instead of “working remotely close to optimally.”
My rules of thumb for NTFS and DFSR are not the textbook answers. They are the practical ones from over a decade of experience in trying to push my hardware to the absolute max. Not because I am obsessed with getting every erg of performance out of my gear…but because I can very rarely afford new gear. I stand by them, and I would love to see someone prove or disprove them in a real production environment. One where you are hammering the arrays underlying the volumes in question with multiple random accesses 24/7. Most especially one in which almost all files are accesses more than once during the course of a day.
Oh, and one where 60M files live on several volumes located on the same array. 60M files, multiple NTFS volumes, one physical array. Transactions off that array in the 10s of TB/day. This is an extreme example, but one that shows exactly how I can arrive at the data I have. That’s *my* practical environment.