The Register® — Biting the hand that feeds IT

Feeds

* Posts by Kebabbert

710 posts • joined Wednesday 22nd July 2009 09:09 GMT

Page:

Kebabbert
Bronze badge

Re: Why go with Power when Xeons are so much cheaper?

"Yes, Power is far behind SPARC on performance". --- so, I have corrected your mistypo for you. Just to add, the SPARC T4 was faster than POWER7 on several benchmarks and the new SPARC T5 servers are twice as fast as the T4 servers. Next year the T6 will arrive, doubling performance again.

Here are some world records so you can catch up on the latest news on the cpu market. There are more benchmarks to the right on this web page:

https://blogs.oracle.com/BestPerf/entry/20130326_sparc_t5_8_tpc

No, IBM can not match this world record, just look at the IBM numbers and you will understand IBM is far behind.

Kebabbert
Bronze badge

Re: big working databases

The new Oracle M5 SPARC server has 32TB RAM and 32 cpus. With HANA compressing the data, the M5 will be capable of handling 64TB in RAM. Would that suffice for your needs? (Fujitsu has also a 32TB RAM server called M10-4S but it has 64 cpus).

Kebabbert
Bronze badge

Re: Why go with Power when Xeons are so much cheaper?

You should try to see the whole picture. If you eliminate all other factors, then you can certainly find something that IBM is better at. But when you consider for instance, price AND performance, then IBM lags behind Intel. Big time.

Kebabbert
Bronze badge

Re: Noisy little buggers...

EVERY YEAR?

Ok, then I missunderstood what the article is about. I am talking about the cicadas with a 17 year long cycle, who are about to hatch this year. Maybe the article is not about the 17 year cicadas. My apologees for confusing.

Kebabbert
Bronze badge

A.C

"...There's definitely advantages to having a whole bunch of CPUs and shared memory connected together with a speed that a pile of PCs isn't going to get anywhere near. ..."

Uhm, did you miss that IBM Mainframes have really slow cpus? A high end 8-socket x86 server has similar or more computing power than the biggest IBM Mainframe with 24 cpus.

Sure, IBM claim they can virtualize 1.500 of the x86 servers on a Mainframe. But if you dig a bit, it turns out that IBM assumes all x86 servers being idle at a few percent, and the Mainframe is 100% loaded. In fact, you can emulate a IBM Mainframe on a laptop using open source "TurboHercules", which allows me to fire up five idling Mainframes on my laptop. But do I claim that my laptop can virtualize five IBM Mainframes? Hell no.

The same thing when Microsoft claimed that Linux is more expensive than Windows. After I dug a bit, it turned out that MS assumes Linux is running on a small Mainframe costing $1 million, and Windows is running on a PC. No wonder that MS concluded that Linux has worse TCO than Windows.

BTW, IBM does the same trick when they claim that a POWER7 server can virtualize loads of x86 servers: all x86 servers are old, like 1GHz 256MB RAM PCs, and they all idle.

Here is the "worlds fastest cpu", according to IBM:

http://www.engadget.com/2010/09/06/ibm-claims-worlds-fastest-processor-with-5-2ghz-z196/

Which is actually dog slow in comparison to x86.

Here is a developer that ported Linux to IBM Mainframes so he could compare Linux workloads on x86 and on Mainframes. He concluded that 1 MIPS == 4MHz x86.

http://www.mail-archive.com/linux-390@vm.marist.edu/msg18587.html

So a 10.000 MIPS Mainframe equals 40 GHz x86. But a 10 core x86 running at 2GHz, gives 20 GHz. Thus, a small 10.000 MIPS Mainframe compares to two x86 cpus with 10 cores.

Kebabbert
Bronze badge

Re: But does he ever use them?

Bill Gates charity fund is for tax reasons. Charity funds have much lower taxes. The fund earns more money than it gives away, the income is greater than the outcome. At the same time, Gates have said he will donate all his money before he dies, how does that add up when he is richer now than before? Because Gates have moved his money to the fund, it seems he is poorer, but he is richer.

Sure, the fund has given some serious money. Like $10 billion or so. But, that is during the course of.. 20(?) years. So, he gives away 0.5 billion a year. That is a steal: much lower taxes, and he only have to give away half a billion a year.

Kebabbert
Bronze badge

"...It is a pretty high use approach from an energy perspective as compared to say a bunch of IBM mainframes running Linux...."

Yes, but you would never get the performance out of IBM Mainframes. Their cpus are much much slower than a decent x86 cpu. A high end 8-socket x86 server has similar or even more computing power than the biggest IBM Mainframe with 24 cpus.

Kebabbert
Bronze badge

Re: Noisy little buggers...

Yes, there is an explanation why their cycle is 17 years. It is something like this (cant remember the details):

cicadas had a short cycle of 1 years, and their predators too. Then cicadas switched to 2 years, and the predators who had a cycle of 1 year, adapted. Then cicadas switched to 3 years, and the predators too. The cicadas could not switch to 4 years, because that is 2*2 years which the predators could handle (they knew how to handle 2 year cycles). So the cicadas tried prime number long cycles, because non primes could the predators handle. Finally, when the cicadas switched to 17 years, the predators had to try every combination (2 years, 3 years, 4 years, etc) but the cycle when they meet was far too long so the predators died out.

It has to do with least common divisor. When you try to find a cycle of 2,3,5 years it is doable, but it takes time. But when you go to 17 years, the predators will meet the cicadas once in every 217 year (or so) which the predators could not handle. So they died out of starvation.

It is described in the book "Fermats last theorem" by Simon Singh.

Kebabbert
Bronze badge

Re: @ the longest comment

eulampios,

Yes, I am very prolific with all my credible links to Linux kernel developers, such as Linus Torvalds himself. I would not like to claim untrue things, who does? Your claims should be verified by credible links.

.

My point in Linux is modified and tailored on the super computers, is that "Linux does not scale, but it is easy to tailor". They never run stock Linux. No one does. You need to heavily change Linux. Linux is just a skeleton, that you can rip out and add things as you like. Linux does not scale, but it is easy to tailor to large clusters, or down to small devices. OTOH Solaris does scale, because it is the same Solaris kernel that powers huge SMP servers with as many as 64-128 cpus, down to small devices. Solaris does not need to be modified - this is true scalability. Linux needs to be modified - if Linux is scalable, then you dont need to modify it. Ergo, Linux is not scalable, Linux does not scale. You can make it scale by modifiying it. So, instead of saying "Linux scales", say instead: "linux is easy to tailor to your needs" - which is why it is so popular in startups. Solaris is a mature and complex kernel, not easy to modify. And besides, you dont need to modify Solaris, it does everything you need out of the box.

.

"Red Hat Enterprise Linux 5 supports up to 255 processors (theoretical) and 64 processors (certified). Red Hat Enterprise Linux 6 supports up to 4,096 processors (theoretical)."

Yes I know that claim by RedHat, they probably have a

#define NR_OF_CPUS 4096

somewhere in the source code. Why not change it to 100000000000? Does that change make Linux scale well? No. You need to rewrite everything, not change a number.

Let me ask you again: show me a huge SMP Linux server for sale, which has as many as 16 cpus, or 32 or 64 cpus. What is the price? And which vendor delivers it? Cant find any such SMP Linux servers for sale? Why? Maybe Linux can not handle 32 or 64 cpus? Maybe Linux has bad scaling on SMP servers?

I know that until recently, there were no Linux SMP servers with 32 or 64 or 128 cpus for scale. Until now, Linux did not exist on such servers. So, how could the developers fix good scaling on such SMP servers? They did not exist. It takes decades to scale well (on SMP servers). So scale well on clusters (SGI Altix) is easy.

When / if Linux SMP servers with 64 cpus arrives, or 128 cpus, it will take another 10-15 years before Linux can scale on such SMP servers. It is like BTRFS, when BTRFS v1.0 arrives, it will take another 10-15 years before it has been stable enough to be trusted in production. ZFS is 10 years old, and we stll find bugs in it! It takes decades after v1.0 when we talk about production.

Kebabbert
Bronze badge

Re: Kebabfart Kebabfart implicateorde Billl HAHAHAHAA!

"...Oh Kebbie, you KNOW it's true, as otherwise Larry wouldn't need to badge Fudgeitso's SPARC64 kit! Duh!..."

That was not a very convincing argument, dont you think? It is well known that the best tech does not always win. For instance, Windows has larger market share and more profitable than OpenVMS or HP-UX, so by your logic, Windows must be better, right? Wrong.

Can you answer the question? I have asked you this question again here for the umpteen time. Why is that a "cache starved cpu" can be 10x faster than POWER6 and CELL running at 3-5GHz? Can you answer?

Kebabbert
Bronze badge

Re: Important style question

"...even when you're both bald and have a ponytail..."

It could be possible if the ponytail is not on your head?

Kebabbert
Bronze badge

Re: It's just a bunch of kids with a hobby

"...That amount of code churn implies one or both of two things:

Firstly, it ain't finished and certainly isn't well tested.

The upshot of one or both of these things is instability and uncertainty over whether something that works today will work tomorrow when you apply those updates...."

Yes, lot of sysadmins complain on this. They say that new code is not mature and ironed out. There are sysadmins that would never touch Linux with a 9-yard stick. It would be interesting to see how fast the entire Linux kernel replaces everything. On average, will it take 9 months before all code is rewritten? And no testing done?

It is said that you need to wait for Windows Service Pack 1 before you can deploy Windows, because only then the source code has started to mature. But until SP1 comes out, Windows code does not change, it is fix and not a moving target so SP1 can iron out bugs. But Linux is a moving target, there are 7 changes per hour for chris sake! As soon as you correct a bug, that piece of code is likely to be replaced. New. Code. Is. Never. Stable. This is a fact.

BTW, it is said that Linux power the majority of all super computers. That is doubtful. For instance, Blue Gene uses Linux to distribute the load to every compute node, and each node PC use a special OS tailored to only do number crunching and nothing else. Super computers never use stock Linux, they have stripped out everything and use a minimalistic Linux kernel that is strongly tailored to number crunching, and nothing else. Linux is very easy to tailor, but not scalable. It is not the same Linux that runs small embedded systems up to huge super computer clusters. (OTOH it is the very same Solaris kernel that runs the biggest SMP servers down to the smallest PCs - that is true scalability).

BTW, Linux scales bad. Linux scales very fine on clusters, large networks with many PCs, doing HPC work, embarassingly parallell workloads. These computers typically have 2048 cores and 64TB RAM or more. For instance, the SGI Altix server with 4096 cores is a cluster.

But Linux does not scale on SMP servers: one huge fat server with as many 32 or 64 cpus (like IBM P795, Oracle M9000/M32, HP Superdome/integrity) and up to 2TB RAM or 4TB RAM. There are no Linux SMP servers with 32 or 64 cpus for sale. The biggest SMP Linux server has 8 cpus today (just an ordinary 8-socket x86 servers such as Oracle M4800). There are two different Linux servers for sale today: 1) large HPC clusters with 1000s of cpus and 10s of TB RAM and 2) SMP servers with 8-sockets. Nothing in between, no 64 Linux SMP servers. Here is another huge Linux server that use one single Linux image running on 2048 cores and 64TB RAM. It turns out to be a cluster that is tricked into being a SMP by running a software hypervisor, but it is not a true SMP, it is just a HPC cluster:

http://www.theregister.co.uk/2011/09/20/scalemp_supports_amd_opterons/

"...Instead of using special ASICs and interconnection protocols to lash together multiple server modes together into a shared memory system, ScaleMP cooked up a special hypervisor layer, called vSMP, that rides atop the x64 processors, memory controllers, and I/O controllers in multiple server nodes. Rather than carve up a single system image into multiple virtual machines, vSMP takes multiple physical servers and – using InfiniBand as a backplane interconnect – makes them look like a giant virtual SMP server with a shared memory space..."

.

Incidentally, Linux kernel hackers agree that new code is unstable. And unstable code makes Linux unstable - that sounds logical, yes? Read what Linux kernel hackers say about the fast code turnover and the instability new untested code brings:

http://kerneltrap.org/Linux/Active_Merge_Windows

"The [linux source code] tree breaks every day, and it's becoming an extremely non-fun environment to work in....We need to slow down the merging, we need to review things more, we need people to test their f--king changes!"

.

Linus Torvalds says Linux is bloated and huge:

http://www.theregister.co.uk/2009/09/22/linus_torvalds_linux_bloated_huge/

"Citing an internal INTEL corp study that tracked kernel releases, Bottomley said Linux performance had dropped about two per centage points at every release, for a cumulative drop of about 12 per cent over the last ten releases. "Is this a problem?" he asked.

"We're getting bloated and huge. Yes, it's a problem," said Torvalds."

.

As Linux kernel Developer Andrew Morton says:

http://lwn.net/Articles/285088/

"I used to think [code quality] was in decline, and I think that I might think that it still is. I see so many regressions which we never fix....it would help if people's patches were less buggy."

.

Linux hacker, ext4 creator Ted Tso, says that Linux developers often cheat and cut corners just to win a benchmark. Never mind if the solution is unstable, that is not important as long as Linux wins the benchmark.

http://phoronix.com/forums/showthread.php?36507-Large-HDD-SSD-Linux-2.6.38-File-System-Comparison&p=181904#post181904

"In the case of reiserfs, Chris Mason submitted a patch 4 years ago to turn on barriers by default, but Hans Reiser vetoed it. Apparently, to Hans, winning the benchmark demolition derby was more important than his user's data. (It's a sad fact that sometimes the desire to win benchmark competition will cause developers to cheat, sometimes at the expense of their users.)...We tried to get the default changed in ext3, but it was overruled by Andrew Morton, on the grounds that it would represent a big performance loss, and he didn't think the corruption happened all that often (!!!!!) --- despite the fact that Chris Mason had developed a python program that would reliably corrupt an ext3 file system if you ran it and then pulled the power plug "

.

Non Linux people:

http://milek.blogspot.se/2010/12/linux-osync-and-write-barriers.html

"This is really scary. I wonder how many developers knew about it especially when coding for Linux when data safety was paramount. Sometimes it feels that some Linux developers are coding to win benchmarks and do not necessarily care about data safety, correctness and standards like POSIX. What is even worse is that some of them don't even bother to tell you about it in official documentation"

.

OpenBSD developer Theo de Radt says the Linux code is bad

http://www.forbes.com/2005/06/16/linux-bsd-unix-cz_dl_0616theo.html

"It's terrible," De Raadt says. "Everyone is using it, and they don't realize how bad it is. And the Linux people will just stick with it and add to it rather than stepping back and saying, 'This is garbage and we should fix it.'"

.

Linux as a file server lacks some abilities.

http://www.enterprisestorageforum.com/sans/features/article.php/3749926

"Go mkfs a 500 TB ext-3/4 or other Linux file system, fill it up with multiple streams of data, add/remove files for a few months with, say, 20 GB/sec of bandwidth from a single large SMP server and crash the system and fsck it and tell me how long it takes. Does the I/O performance stay consistent during that few months of adding and removing files? Does the file system perform well with 1 million files in a single directory and 100 million files in the file system?...My guess is the exercise would prove my point: Linux file systems have scaling issues that need to be addressed before 100 TB environments become commonplace. Addressing them now without rancor just might make Linux everything its proponents have hoped for."

Kebabbert
Bronze badge

Re: Kebabfart implicateorde Billl HAHAHAHAA!

Matt Bryant,

"...And the main reason is because it [Niagara] chokes on heavy single-thread apps and doesn't have enough cache...."

If your claim is true, how can Niagara best much higher clocked cpus then? If Niagara is cache starved, how can four Niagara 1.6GHz cpus equal 14 (fourteen) POWER6 at 5GHz on SIEBEL v8 benchmarks? You have never answered this question. Every time you claim the Niagara is cache starved, I ask this question, and every time you are silent. Something does not add up in your posts. :)

.

.

Jesper Frimann,

Ok, the POWER7+ has not just the double amount of cores? Ok, thanks for that information, I will not say that again.

Ive googled a bit now, and it seems the POWER7+ has higher Hz and 2.5x larger cpu cache. And some hardware accelerators, just as T4 and T5. Is that it? Higher Hz and 2.5x larger cpu cache, gives 20% better performance? Not that impressive, if you ask me. It is just like Intel Haswell, which is 10-15% faster than Ivy Bridge, that is not impressive if you ask me. 100% better performance - THAT is impressive, if you ask me. The SPARC T6 will be much faster than T5, it will again double throughput, and have 1.5x stronger threads than T5. And in two years, we will see 16.384 thread SPARC server with 64TB RAM.

I am more interested in the POWER8. POWER7+ is too small a upgrade to be interesting for geeks. All geeks can appreciate a good cpu and cool tech, no matter who does it IBM or Oracle or HP. But HP has no fast cool servers, instead they have extremely stable OSes. Unix is unstable compared to OpenVMS, and HP-UX is the most stable Unix out there, sysadmins say. I wish OpenVMS was open sourced so we could try it out for free. That would be cool with the best clustering out there: OpenVMS.

Kebabbert
Bronze badge

Funny

IBM sold off the hard disk division because according to IBM, disk drives would be so large soon that only one disk would suffice for an entire family, and just a few for the office. Now IBM is going into storage again. Maybe IBM did not realize that the larger disks we have, the more we store on them? For instance, soon there will be 4K video, and those movies will make 4TB disks small.

It is the same reasoning as "we dont manufacture cpus more, because soon the cpus will be so fast you can hardly imagine". But, the more cpu power we have, the more we use. And we come up with new areas that has been inaccessible earlier with weak cpus.

Kebabbert
Bronze badge

Re: implicateorde Billl HAHAHAHAA!

Matt Bryant,

Please dont talk about the CELL cpu, it was a freak of nature. IBM has killed it, and there will be no more development of CELL. For a reason. It performed awfully in real life workloads. As soon as the workload did not fit into the cache, the performance dropped 95%. Yes, only 5% of the performance remained. Terrible design.

For instance in String pattern matching, the 3.2GHz CELL was 70% faster than the Niagara T2+ cpu at 1.6GHz. This result was for small workloads. When workload exceeded the cache, performance dropped radically. You needed 13 (thirteen) CELL cpus to match one single Niagara T2+ cpu. And, whats worse, the IBM team did heavy optimizing in string pattern matching benchmark, they used assembler, loop unrolling, etc. The Sun team just implemented the Aho-Corasick algorithm in pure C, and did no optimization. And still, the Niagara T2+ was 13x faster than CELL with too big workloads that did not fit into the cache.

It is funny how Niagara T2+ could perform order of magnitude better than IBM CELL, with a very small cache. I think the T2+ has something like 2MB cache in total. This benchmark proves yet again, that the Niagara is not cache starved, but instead has a different design which makes it superior to several times higher GHz and several times larger cache.

1.6GHz 2MB cache > 5GHz 20MB cache

does not seem it is cache starved to me.

IBM CELL is not further developed, it was a dead end. For a reason. Dont talk about the CELL, please. Talk about a good cpu instead, like the POWER7. POWER7+ seems to suck, because it has double the number of cores, and is only 20% faster. Sign of a bad design.

Kebabbert
Bronze badge

Re: Confusing a server filesystem and enterprise storage again

There is an article here, where EMC and NetApp executives talks about the new kid on the block: Nexenta which offers OpenSolaris/ZFS servers. EMC and NetApp says they are using normal commodity hardware running their own software ontop. Just like Nexenta does. And Tegile. COTS.

So I dont see the difference between ZFS servers and EMC and NetApp. I am just citing NetApp and EMC executives. Do you mean they are wrong? They dont use commodity hardware off the shelf? Maybe their engineers have been lying to the executives?

Or what do you mean?

Kebabbert
Bronze badge

Re: Confusing a server filesystem and enterprise storage again

"...Confusing a server filesystem and enterprise storage again..."

What do you mean? These Enterprise storage systems are basically just a server with lots of RAM and CPU, and some flash/SSD based caches, and some SATA disks or SAS disks. They just run some special software. Even EMC and NetApp confirms this, but they also add in hardware raid cards into the mix.

So where is the difference between a enterprise storage system, and these ZFS storage systems? Virtually none. Oracle sells their big 7420 Petabyte servers that beat NetApp in price and performance, and they are exactly as I have detailed: server running special software. In Oracles case it is Solaris/ZFS. Tegile does this too. And both are Enterprise. There are people buyilding similar servers by themselves, running Solaris / ZFS.

Just build a heavy server and run ZFS on it, and you are basically done. (Not quite, but almost like this)

Kebabbert
Bronze badge

Re: Tentri based on ZFS?

And more confirmation:

http://www.theregister.co.uk/2012/06/01/tegile_zebi/print.html

7x IOPS? That is good indeed, with ZFS dedupe. Today, Oracle's implementation of ZFS dedupe is not good enough. Oracle should buy Tegile and use their dedupe instead, because Tegile's ZFS dedupe implementation seems to be really good and increases performance whilst using low amounts of RAM and cpu. Tegile seems to know what they are doing. And they are cheap too.

Kebabbert
Bronze badge

Re: Tentri based on ZFS?

Confirmed. Tegile is running Solaris and is based on ZFS.

https://www.facebook.com/permalink.php?story_fbid=331644460203544&id=196835830351075

"...Tegile Systems: Hi Stephen - Some of the baseline functionality is on ZFS, but we've put a lot of work into areas we found needed improvement or where we built our differentiation...."

Also:

http://communities.vmware.com/thread/398035

"...The Tegile OS is based on OpenSolaris/ZFS with some custom improvements that Tegile calls MASS (Metadata Accelerated Storage System), which implement deduplication and compression in a different way than ZFS...."

Tegile is just a Solaris / ZFS combo. Just as Nexenta or several other storage solutions.

Kebabbert
Bronze badge

Re: Tentri based on ZFS?

Sorry, I meant to ask "Tegile based on ZFS?", not Tentri.

Kebabbert
Bronze badge

Tentri based on ZFS?

What does this mean?

http://www.tegile.com/blog/george_tintri

"...gave me the platform to clear the air on how Tegile’s wicked smart engineers have used ZFS asa base platform for many of the boring parts of storage (who really wants to write their own NFS stack these days anyways??), and focus their time and energy on maximizing our value and differentiation in the market..."

Kebabbert
Bronze badge

Re: Kebabfart XFS not safe

"...<Yawn> Most servers and arrays I know of can do this for themselves already by seperate PSU monitoring software..."

Well good for them. But the point is, ZFS can detect faulty PSU without additional software. The data corruption detection of ZFS is so strong it can even detect faulty PSU without additional software. People report that ZFS detected faulty SATA cables. Detected faulty fibre channel switches. Faulty ECC RAM dimms. etc. All this, without any additional software.

This is a a true testament to the extremely strong data integrity of ZFS, which surpasses every other filesystem on the market. Or do you know of any other filesystem or storage system that can do this?

As CERN says about hardware raid:

Measurements at CERN

- Wrote a simple application to write/verify 1GB file

- Write 1MB, sleep 1 second, etc. until 1GB has been written

- Read 1MB, verify, sleep 1 second, etc.

- Ran on 3000 rack servers with HW RAID card

- After 3 weeks, found 152 instances of silent data corruption

- Previously thought “everything was fine”

- HW RAID only detected “noisy” data errors

- Need end-to-end verification to catch silent data corruption

This shows that hardware raid does not offer data integrity at all, and should not be trusted. I know that you trust hardware raid, but you shouldnt. I also know that you dont think ECC RAM is necessary in servers, but they are. I have said that you should read research on data corruption, umpteen times but you refuse. I dont really understand why you reject all research on this matter...

Kebabbert
Bronze badge

Re: XFS not safe

Malcolm Weir,

"....It is NOT true to assert that XFS (or anything else) is "unsafe" simply because they do not have those error checks. Error checking can be implemented in many different places and in many different ways, and the fact that the ZFS folks have decided there is One True Way is irrelevant ..."

Yes, my assertion is TRUE. Let me explain. There are lot of error checksums in every domain. There are checksums on disk, on ECC RAM, on interface, etc. As my amazon link above shows: there are checksums everywhere. Every piece of hardware have checksums. Checksums are implemented in many different places and in different ways. Does this massive checksumming help? No. Let me explain why.

The reason all these checksums does not help is because of this:

Have you ever played a game as a kid? There are lot of children sitting in a ring, and one kid whispers a word to the next kid. And he whispers on, etc. At the end of the ring, the words are compared and they always differ. The word got distorted in the chain.

Lesson learned, it does not help to have checksums within a domain. You must have checksums that passes through the boundaries, you must be able to compare checksum from the beginning of the chain, and the end of the chain. Are those checksums identical? End-to-end checksums are needed! When you pass a boundary, the data might get corrupt. So within a boundary, the corrupted data have a good checksum. But that does not help. You must have end-to-end checksums, you must always compare the beginning checksum with the last checksum. This is what ZFS does.

ZFS is monolithic, it is a raid manager, filesystem, etc - all in one. Other solutions have a separate raid layer, a filesystem, separate raid card, etc. There are many different layers, and the checksum can not be passed between the layers. ZFS has control of everything, from RAM down to disk because it is monolithic and therefore can compare from end to end. Other layered solutions can not do this.

For instance, ZFS can detect faulty power supplies whereas other solutions can not. If the power supply is flaky, ZFS will notice data corruption within minutes and warn immediately. Earlier filesystems on the same computer did not notice:

https://blogs.oracle.com/elowe/entry/zfs_saves_the_day_ta

And ZFS also immediately detects faulty RAM dimms. ZFS also detects faulty switches!!! Here is a fibre channel switch that is corrupt. ZFS was the first one to detect it, it had gone unnoticed earlier:

http://jforonda.blogspot.se/2007/01/faulty-fc-port-meets-zfs.html

Please dont tell me that other filesystems or hardware raid can detect faulty switches, because they can not. If ZFS stores the data on a storage server via a switch, then ZFS can detect all problems in the path, because ZFS compares what is on disk, with what is on RAM. End to end. No one else does that, they can not detect faulty switches, or faulty power supplies, or....

Sun learned that checksums does not help. CERN confirms this in a study "checksumming is not enough, you need end-to-end checksums (zfs has a point)". I can google this CERN study for you, if you wish to read it. The point is that ZFS does end-to-end checksums, whereas other solutions does not. It does not suffice to add checksums everywhere, you will not get a safer solution. You need end-to-end. Which is what ZFS does.

Do you understand now why ZFS is safe, and other solutions are not?

Kebabbert
Bronze badge

Homeopathy??

ukgnome

"...It is well known that Steve Jobs was into homoeopathy,..."

Have you read the Steve Jobs Biography?? Jobs never mentioned homeopathy. Where did you hear this? Any links?

In the book, it says that Steve was uncomfortable with someone cutting his body, that is the reason he did not want to undergo surgery. Also, Steve had a very strong will and a habit of ignoring problematic things. That is the reason he did wait for surgery. Nowhere it is said that Steve tried diets or homeopathy to battle his cancer. Steve was into diets, but he did that his whole life and it had nothing to do with cancer.

Kebabbert
Bronze badge

Re: But ...

Oddjobz

"...So you think checksumming is the only reason to use ZFS? And you think hardware RAID is a good solution?

Seriously?!...."

Yes, the major reason to use ZFS is the heavy data protection. The rest of its features is just icing on the cake. Remember, ZFS does a checksum every times something is read. It is like doing a MD5 checksum, it takes time. As soon as you read something, ZFS does a checksum. Calculation of the checksum takes time and cpu. I heard of a NTFS driver for Linux which was faster than NTFS on Windows, but if you cut corners and omit all safety nets, then you can achieve which speed you want. For instance, XFS did a fsck of 10TB data which took something like 5 minutes. If you ever traverse that much data, it will take many hours. The conclusion is that XFS fsck cut corners and dont check all data. In fact, fsck normally only checks metadata, but the data itself might be corrupt.

And no, I dont think hardware raid is good. But I am saying that if you turn off ZFS checksumming, you have an unsafe solution. You might as well as use hw-raid which is also unsafe. You have missed the point in using ZFS, if you turn off checksumming.

If ZFS on Linux is a bit unstable, it might be. But I dont think that you should draw the conclusion that ZFS on solaris is unstable.

Kebabbert
Bronze badge

Re: Kebabfart M5-32 is not the only one with 32TB RAM

@Matt Bryant,

SETI@home is HPC workload. HPC are clusters. There is a difference between SMP servers and HPC clusters. HPC clusters can not replace SMP servers. That is the reason you need to buy Oracle/IBM/HP if you need big SMP servers. And, IBM Mainframes are also SMP servers, they are not HPC clusters.

Kebabbert
Bronze badge

Re: POWER7 780 SpecInt results are higher than Larry's chart

"... Larry is comparing a last generation (although only a + generation in POWER terms) of his competitors product to his own brand new sparkling product. And furthermore a competitor product that is not sold any more...."

But the question is not if the 780-MHC is sold or not. The question is, does it use POWER7 or POWER7+? Because if it uses POWER7+ and still not sold, it does not matter. Then it is a valid benchmark. But if 780-MHC is sporting an old POWER7 cpu, then it is not fair of Larry. He should compare T5 to POWER7+. Not T5 to POWER7. As long as Larry compares T5 to POWER7+ I dont care if the POWER7+ server is sold or not.

.

.

"...If he really wanted to compare against a current POWER product why didn't he compare the four socket T5-4 against the four socket POWER 760 ?..."

Because T5 scales better. By comparing 8 socket servers, he shows the good scaling of T5. The more cpus, the better the T5 servers perform. Of course, if the 780-MHC sports older POWER7 cpus, then Larry is unfair and should never have done that comparison. In that case, it is better to compare 4-socket servers. Alternatively, double up the IBM 4-socket benchmark to simulate 8-socket servers (but that assumes that IBM scales as good from 4-socket to 8-socket. Which is not obvious)

Kebabbert
Bronze badge

Re: But ...

@oddjobs,

"...For a start, if you want performance then you won't be using all the frills like Checksumming and compression because they simply don't perform in a real environment - that's not to say it could be done better, it's just turning this stuff on can decrease your throughput by 75% or so - which is going to make many baulk..."

If you turn of the checksumming there is no point of using ZFS. The hype with ZFS, the main point of using ZFS, is because it protects your data via checksums. If you dont do that, then use another filesystem together with hw-raid instead.

If you really need to increase performance on ZFS, there are better ways than to turn off checksumming. Instead, you should first get much RAM. With much RAM, the disks will never be touched at all. Step two, is to add a fast SSD disk as a cache. There are two caches, read cache and write cache:

http://en.wikipedia.org/wiki/ZFS#ZFS_cache:_ARC_.28L1.29.2C_L2ARC.2C_ZIL

According to investment bank Morgan Stanley, ZFS performs better than Linux ext4 and at the same time getting away with a threefold reduction in the number of servers needed:

http://conferences.inf.ed.ac.uk/eakc2012/slides/AFS_on_Solaris_ZFS.pdf

Kebabbert
Bronze badge

Re: Kebabfart Add / Remove disks?

"...Yeah, that works, as long as you can stomach the ridiculously long rebuild time for each disk. Oh, and please do pretend that is also not a problem with ZFS just like hardware RAID isn't a problem!..."

Yes, long resilver times on huge disks are a problem. I have never denied this. What, do you expect me to deny there are any problems with ZFS, that ZFS is 100% bullet proof? Have I ever said that?

To mitigate the problem of resilvering times, is that ZFS only resilvers data. Empty bits are not resilvered. Hardware raid cards always resilvers the whole disk, including empty bits and data. Because ZFS controls everything, it knows what bits are data, and what bits are empty space, and only repairs data. Hw-raid does not control everything, and knows not.

Kebabbert
Bronze badge
Happy

Re: Gordon Gordon AC Destroyed All Braincells Gordon BTRFS? You must be joking...

@Matt Bryant

"...Your blanket denial of hardware RAID speaks volume of your ZFS zealotry. Other file systems work fine with hardware RAID..."

We tried to explain to you that other filesystems does not work fine with hw-raid. First of all, there might be errors in the filesystem, or errors in the hw-raid system, or on the disk, or ...

There are many different domains where errors might creep in. Every domain has checksums, but that does not help because when passing from a domain to another, there are no checksums. That is the point.

Have you played the game as a kid, where you whisper a word to your neighbour sitting in a ring? The word at the end differs always from the first word uttered. The reason is that there are no comparisons from the beginning to the end. There are no end-to-end checksums. This is exactly what ZFS does: it has end-to-end checksums. It checks that the data in RAM is exactly the same as on the disk, ignoring all domains. The reason ZFS has this magic end-to-end checksums, is why ZFS has superior data integrity. It compares the beginning and the end. No one else does that.

The reason ZFS can compare from end-to-end, is because ZFS is monolithic and has control of everything, the whole chain: raid, down to disk. ZFS contains raid manager and filesystem manager. The purpose of this design is because ZFS can do end to end checksums.

So when other filesystems separate raid manager with filesystem, or also adds another layer: the hardware raid - that is bad from data integrity stand point. You can not expect to have checksums on each domain, you need to checksum from end-to-end. That is the only solution. ZFS does that by being monolithic and controling everything. Coincidentally, Linux hackers mocked ZFS design for being monolithic and Andrew Morton called ZFS "rampant layering violation" because it had no layers. The point is, if you have no layers, only then you can do end-to-end!

Finally Linux kernel hackers seemed to have understood this, and created BTRFS which, violates layers just as ZFS, as BTRFS controls everything: raid, filesystem, etc. But, you never hear any complaints from Linux hackers that BTRFS violates layers, you only hear complaints when non Linux tech does. Apparently, ZFS is a bad design according to Linux hackers, but BTRFS which is a clone of ZFS, is not. :)

Kebabbert
Bronze badge

Re: Kebabfart Gordon Destroyed All Braincells Gordon BTRFS? You must be joking...

@Matt Bryant

"...So first you say that ZFS having problems with hardware RAID is a lie,..."

Que? Can you quote me on this? Everybody knows that ZFS + hw raid is a major no no. I am quite active on forums where we discuss ZFS, and I always say that hwraid + ZFS should be avoided.

ZFS can work correctly with hardware raid only if the hardware raid functionality is shut off, i.e. JBOD or flashed away. If you insist on using hw raid with ZFS, then ZFS can detect all errors, but can not repair all errors. That is your problem, not ZFS problem.

Kebabbert
Bronze badge

Re: Gordon Destroyed All Braincells Gordon BTRFS? You must be joking...

Another thing i my link above, the Investment Bank Morgan Stanley are migrating away from Linux + ext4 to ZFS because of huge cost (three fold reduction of Linux servers) savings and increased performance, is using OpenAFS with ZFS:

http://conferences.inf.ed.ac.uk/eakc2012/slides/AFS_on_Solaris_ZFS.pdf

OpenAFS is distributed. It seems that these clustered distributed Lustre, OpenAFS, etc filesystems rely on a normal filesystem to do the actual data storage. That is where ZFS fits in. Lustre + ZFS rocks. OpenAFS + ZFS rocks. etc.

Kebabbert
Bronze badge

Re: Gordon Destroyed All Braincells Gordon BTRFS? You must be joking...

@Matt Bryant

Actually, the wikipedia link is not true. It contains false information. This is not correct:

"...When using ZFS on high end storage devices or any hardware RAID controller, it is important to realize that ZFS needs access to multiple devices to be able to perform the automatic self-healing functionality[39]...."

If you read the link [39] it says the opposite:

"As an alternative you could use a special feature of ZFS: By setting the property copies with the zfs command you can tell ZFS to write copies of your data on the same LUN. As you control this on a "per dataset" granularity you can use it just for your most important data. But the basic problem for many people is the same: It's like RAID1 on a single disk "

Hence, it says that ZFS can guarantee data integrity on a single disk. ZFS does not need access to multiple devices, one disk will do. Read the link. Wikipedia is wrong on this.

.

Another thing in the wikipedia article that is not correct is "the hardware raid should be configured as JBOD or RAID 0 mode". No, that is not correct, because some hardware raid cards adds additional information on disks and other stuff, each additional layer confuses ZFS, so ZFS can not guarantee data integrity when using hw raid. If you are using hw raid, then you need to configure it as JBOD mode, but the best would be to reflash the firmware in the card, so the raid functionality disappears, i.e. "IT mode", i.e. turning the hw raid card into a simple HBA disk card. Actually, it is common to reflash hw raid cards into IT mode, by ZFS users.

.

Here is the large investment bank Morgan Stanley talking about the benefits of migrating from Linux ext4 to ZFS (huge cost savings, and increased performance):

http://conferences.inf.ed.ac.uk/eakc2012/slides/AFS_on_Solaris_ZFS.pdf

Kebabbert
Bronze badge

Re: Add / Remove disks?

@BristolBachelor,

Each ZFS raid consists of one or several "vdev". A vdev is a group of disks. Or files. Or partitions. Or...

Each vdev should be configured as raidz1 (similar to raid-5) or raidz2 (raid-6) or a mirror.

You can never change the number of disks in a vdev. If it is 5 disks, then 5 disks it is.

You can always add another vdev to a ZFS raid, but you can never remove a vdev from a ZFS raid.

Say you have 5 disks. You create a ZFS raid with raidz1. This means there is one vdev, configured as raidz1. You can not expand this 5 disk vdev to 6 disks. The number of disks are fixed at 5 in a vdev. But you can add another vdev. So now you can add a mirror to the ZFS raid. This means the ZFS raid consists of one raidz1 vdev of 5 disks, and a mirror vdev of two disks. And you can add as many vdevs as you like. Actually, you could add a single disk to a ZFS raid, but that would be stupid. Because if that single disk crashes then you have lost all your data, even in the raidz1 vdev. Always add disks with redundancy, never add a single disk.

You can also swap the disks to a larger one. Say you have 5 disks in a ZFS raid, consisting of 1TB disks. Now you can replace one disk with another 2TB disk and repair the raid. After repair, replace another disk with 2TB and repair. Rinse and repeat, and finally you have 5 disks, consisting of 2TB disks.

If you have three 500GB disks and two 1TB disks, then if you create a 5 disk raidz1 then it will be of size: 5 x 500GB. The smallest disk decides the storage capacity.

Kebabbert
Bronze badge

Re: Destroyed All Braincells Destroyed All Braincells Gordon BTRFS?......

@Matt Bryant

"...you have to give up all concepts of hardware RAID and instead use a really big, monolithic server, with massive amounts of RAM..."

Who wants to use old fashioned hardware raid?

http://en.wikipedia.org/wiki/RAID#Problems_with_RAID

Hw raid is not even safe. Just read the research papers from NetApp who relies heavily on hw raid:

http://research.cs.wisc.edu/adsl/Publications/latent-sigmetrics07.pdf

"A real life study of 1.5 million HDDs in the NetApp database found that, on average, 1 in 90 SATA drives will have silent corruption which is not caught by hardware RAID verification process; for a RAID-5 system, that works out to one undetected error for every 67 TB of data read."

Kebabbert
Bronze badge

Re: Ibm marketing...

@Mr Nelson

"...No, Kebbabert, you are the one distorting the real facts...Customers do buy per core and that's what matters when choosing servers. It matters for licensing and it matters for server costs...."

And I quote you again from an above post:

"Larry Ellison is it again claiming he surpassed IBM Power's performance. Don't believe that for a second. Here are the facts. Very conveniently Oracle compares server performance based on number of processors, a.k.a, chips or sockets. Processors are not a real measurement of server performance for the simple reason that different processors have different core counts....In reality the Power 760 beats the T5-8 in Per-Core performance... It seems like Power is still the performance king and by a good margin".

.

This is just too weird. I dont understand how you can reason like this? Am I the one who is "distorting the facts" or is it someone else? Look, who has the world record today? Is it Oracle or is it IBM? According to you, it is IBM. Are you telling the truth? If you need to get the highest possible performance, which vendor do you go to then? IBM or Oracle? Assume a customer says:

-We need to get the highest SAP performance for 8-socket servers. Whom should I choose, IBM or Oracle?

Mr Nelsons answer:

-You should choose IBM because they have faster cores.

-But according to official benchmarks, IBM reaches 139,000 SAP and Oracle reaches 221,000 SAP on 8-socket servers.

-Yes, that is true. But remember that IBM has faster cores, so therefore IBM is faster!

-Que? 139.000 SAP is faster than 221.000 SAP???

-Yes.

Mr Nelson, how in earth do you explain that 139.000 > 221.000? Because mathematically, and logically, you are wrong.

Of course, if you factor in performance/core you and IBM are correct. But Oracle is not discussing performance/core, Oracle is discussing performance/cpu. Nothing else. Oracle is claiming they have the fastest cpus today. Are they wrong, do you mean? Is this also logically true, you mean?

IBM have faster cores (true) => IBM have faster cpus (not true)

.

.

Back in the days, what would people have said if Sun reasoned like this:

-1.6GHz T2+ is half as fast as 5GHz POWER6 on this benchmark. But if you compare GHz to GHz, then you see that T2+ gets more work done than the POWER6. Ergo, T2+ is the faster cpu! IBM are distorting the facts, dont believe a word of what IBM says! Sun gets more work done per GHz and therefore Sun has the performance crown, by a good margin. T2+ is the fastest cpu!

-But... POWER6 has a higher score on the benchmark, should not we buy POWER6 servers?

-No, that would be silly, because the T2+ gets more work done per GHz!

-But we dont care about Performance/GHz, we only care about the highest performance?

-Trust me on this. Dont believe a word of what IBM says. T2+ is faster.

-Que???

If Sun would have reasoned like this back then, everybody would have thought that Sun would be crazy and could not be trusted. IBM reasons like this today, but that is not a problem apparently. I have heard this reasoning from other IBM supporters, for instance, someone here claimed that because POWER6 has faster cores than Intel Xeon, the POWER6 cpu is faster than Intel Xeon, even though the Intel Xeon scored higher on LINPACK benches. Strange. POWER6 scored lower on LINPACK, and Intel Xeon scored higher on LINPACK - and still POWER6 is the superior choice if you need the highest LINPACK performance? I will never understand IBMers. They have turned upside down on the laws of logic.

Kebabbert
Bronze badge

Re: POWER7 780 SpecInt results are higher than Larry's chart

But the POWER 780-MHC is still a POWER7 server, right? So it gives a good picture of how fast an 8-socket POWER7 server is. The ballpark specint numbers an 8-socket POWER7 would give. If IBM resells an 8-socket POWER7 server, it would not be much faster than the 780-MHC, it would be as fast.

8 socket POWER7 to 8 socket T5, that sounds fair to me. If you want to compare 16 socket POWER7 to 8 socket T5, why stop there, why dont you compare a 32 socket P795 to a T5 as well? Does that sound fair?

Kebabbert
Bronze badge
Happy

Re: Kebabfart M5-32 is not the only one with 32TB RAM

@Matt Bryant,

You are missing the point. The business DO care. You can not run SMP workloads on a cluster. There is a reason Linux is targeting 4-8 socket SMP servers, and 2048-4096 cpu servers (clusters), with nothing in between. There are no 16 nor 32 cpu linux servers for sale. If it were easy to create SMP servers, then we would see many cheap 32 / 64 / 128 cpu linux servers for a fraction of the price of the IBM P795. But the only 32 cpu servers, are from Oracle, IBM and HP. There are no such linux servers.

If you could run SMP workloads on a cheap cluster, there would be no market for P795 / m9000 / HP superdome(?). Why would anyone pay huge amounts for a Unix server, when you can get a cheap Linux cluster? No one would. For instance, the P595 that IBM used for the old TPC-C record costed $35 million list price. One single f-cking 32 cpu server. Why not buy 128 cheap PCs and stuck on a fast switch with Linux instead? SMP servers are different from clusters.

Here is an article on one of those 2048 cpu Linux servers (which turns out to be a cluster)

http://www.theregister.co.uk/2011/09/20/scalemp_supports_amd_opterons/

"...Instead of using special ASICs and interconnection protocols to lash together multiple server modes together into a shared memory system, ScaleMP cooked up a special hypervisor layer, called vSMP, that rides atop the x64 processors, memory controllers, and I/O controllers in multiple server nodes. Rather than carve up a single system image into multiple virtual machines, vSMP takes multiple physical servers and – using InfiniBand as a backplane interconnect – makes them look like a giant virtual SMP server with a shared memory space..."

.

.

@Jesper,

Yes, I know about the IBM P795. If you read my post again, you will see that I specifically mentioned the IBM P795. Although Larry Ellison mocks it and other SMP servers, I think the P795 is mighty cool. It is probably the most powerful server for sale today. It would be cool to just see and touch one of those beasts in real life! :) I hope Oracle can beat it in two years, when Oracle releases the 16.384 thread 64TB RAM SPARC server. But IBM is not waiting, IBM will release POWER8 equipped servers then. It will be an interesting future for all of us! :)

Kebabbert
Bronze badge

Re: @integr8d

No, the only place where there is a recommendation of "1GB RAM per TB disk", is when you use deduplication on ZFS. Deduplication requires much RAM, otherwise performance grinds to a halt. Dedup on ZFS is not production ready. Avoid it!

ZFS in itself does not require much RAM. I have used it on Pentium4, 1GB RAM systems. The thing is, if you have much RAM, then ZFS will turn it into a huge disk cache called ARC. This will speed up things. If you do not have much RAM, then ZFS will always reach for the disks which is slow. Therefore you should use much RAM if you can - but it is not a requirement!

You can also use fast SSD disks as a cache, called L2ARC. With fast SSD disks you can reach 100.000 of IOPS and several GB/sec bandwidth on ZFS servers.

Of course, IBM latest supercomputer Sequioa is now using Lustre + ZFS to deploy 55 PB and 1TB/sec bandwidth of clustered storage. Just google on it, very interesting read for those of you who are geeks.

Kebabbert
Bronze badge

XFS not safe

@Malcolm Weir,

Recent research shows that XFS is not really safe. It does not protect your data against corruption. And it also does not detect all type of errors. Here is a PhD thesis on data protection capabilities of XFS, JFS, ext3, etc:

http://www.zdnet.com/blog/storage/how-microsoft-puts-your-data-at-risk/169

The conclusion is that all those filesystems are not designed for data protection.

When you have a small filesystem, say a few TB there are not much risk of silently corrupted data. But when you venture into Big Data of many many TB or even PB, there is always silently corrupted data somewhere, just read the experience of Amazon.com:

http://perspectives.mvdirona.com/2012/02/26/ObservationsOnErrorsCorrectionsTrustOfDependentSystems.aspx

"...Another frequent question is “non-ECC mother boards are much cheaper -- do we really need ECC on memory?” The answer is always yes. At scale(!!!!!!!!), error detection and correction at lower levels fails to correct or even detect some problems. Software stacks above introduce errors. Hardware introduces more errors. Firmware introduces errors. Errors creep in everywhere and absolutely nobody and nothing can be trusted....Over the years, each time I have had an opportunity to see the impact of adding a new layer of error detection, the result has been the same. It fires fast and it fires frequently. In each of these cases, I predicted we would find issues at scale. But, even starting from that perspective, each time I was amazed at the frequency the error correction code fired..."

.

.

Most of the time, you wont even notice you have corrupted data, because the system will not know it, nor detect it. For instance, just look at the spec sheet of any high end Fibre Channel or SAS disk, and it will always say "One irrecoverable error for every 10^16 bits read". Those errors are not recoverable. Some of the errors are not even detectable. There are always cases that error repairing algorithms can not handle. Some errors are uncorrectable, some errors are undetectable. Here is more information with lots of research papers on error detection:

http://en.wikipedia.org/wiki/ZFS#Data_integrity

.

.

@Alan Brown

"They all work - and ZFS is the only FS for linux which can detect and repair disk ECC failures (others can detect, but not repair)"

This is not true. Read the research above. Other filesystems can not even detect errors, let alone ECC failures or other types of failures such as ghost writes.

OTOH, researchers have tried to provoke ZFS and inject artificial errors too, and ZFS detected and recovered from all errors. No other filesystem nor hardware raid, can do that. THAT is the reason ZFS is hyped. Not because it is faster or all of its functions such as snap shot, who cares about performance if your data is silently altered without the system even noticing?

http://research.cs.wisc.edu/wind/Publications/zfs-corruption-fast10.pdf

.

.

ZFS production ready on Linux? I doubt that. Linux has a long history of cutting corners just to win benchmarks, etc. Safety suffers on Linux, just to win benchmarks. See what Ted Tso writes, the creator of the ext4 filesystem:

"In the case of reiserfs, Chris Mason submitted a patch 4 years ago to turn on barriers by default, but Hans Reiser vetoed it. Apparently, to Hans, winning the benchmark demolition derby was more important than his user's data. (It's a sad fact that sometimes the desire to win benchmark competition will cause developers to cheat, sometimes at the expense of their users.)...We tried to get the default changed in ext3, but it was overruled by Andrew Morton, on the grounds that it would represent a big performance loss, and he didn't think the corruption happened all that often (!!!!!) --- despite the fact that Chris Mason had developed a python program that would reliably corrupt an ext3 file system if you ran it and then pulled the power plug "

The conclusion is that Linux can not be trusted, because of all cheating. Linux users are prematurely declaring Linux tech as safe, when it is not. It is almost as if Microsoft would declare ReFS and Storage spaces to be production ready, that would be funny. Just google on peoples experiences of them.

I really doubt BTRFS will be production ready soon. ZFS is over ten years old, and still we find bugs in it. There are sysadmins that does not trust ZFS, because it is not tried enough, it is too new and fancy. It takes decades before a filesystem gets proven. Even when/if BTRFS gets production ready, it will take years.

ZFS on linux is production ready? Hmmm....

Kebabbert
Bronze badge

Re: Ibm marketing...

The above post of mine, was adressing mr nelson

Kebabbert
Bronze badge

Re: POWER7 780 SpecInt results are higher than Larry's chart

Maybe because ibm used 16 cpus in your link, whereas oracle used 8 cpus? To draw any conclusions, you need to normalize, and use equal amount of cpus.

Kebabbert
Bronze badge

Re: Oracle needs to get these results published when the processor is available...

What "memory starving" or "cache starving"? You do know that the even the old t2+ with 2 mb cache in total,was 10x as fast as the power6 on some workloads? If niagara cpus had cache or memory problems, they would never have any world records back in the days, nor beat 5ghz cpus with huge caches such as power6.

Kebabbert
Bronze badge

Re: Ibm marketing...

No, you are wrong and distorting facts.

Let us be clear here, exaclty what is oracle claiming? To have the worlds fastest cpus. And they have. This is a fact, look at the benches, for instance specint where oracle crushes ibm.

Ibm and you, are reasoning like this: ibm has faster cores (true) => ibm has faster cpus (false). This is called distorting facts, at best, and pure fud at worst.

Sure, ibm might have faster cores which is a economical factor when you license software, but the discussion is not about price/performance. The discussion is about who have the fastest cpus. Not the fastest cores. And who have the fastest cpus today?

If you think that stronger cores are important pricing factors, then say so. Dont try to turn that argument into that ibm still has the worlds fastest cpus - that would be an outright lie and fud.

I am not surprised that t5 is faster than power7, because it is newer. I want to compare same gen cpus, power8 vs t6(?). That would be interesting. Anyonecan win over old tech, just dont brag about it. It is nothing honourable in beating old tech. I dont consider power7+ new tech, sure it had a slight impeovement, but still it is the same generation as p7. Hence, t5 should win over p7+ too. If not, t5 sucks because it is the new generation. Same generation comparisons says something.

Kebabbert
Bronze badge

Re: CISC vs. RISC

No. The original idea of RISC was to discard the microcode, and do it all in hardware instead. Discarding the microcode and simplifying the instruction set, was the big thing. Microcode made eveything that much slower.

Kebabbert
Bronze badge

Re: M5-32 is not the only one with 32TB RAM

ToddR

If you read wikipedia on ccNUMA, it says that ccNuma servers are just a cluster. This means that the SGI altix servers with 1000s of cores and 64(?) tb ram, is just a hpc cluster. Sure, they might use single image, by using a hypervisor that tricks linux into believing it is single image, but it is not a smp server.

Ibm p795 and oracle servers are smp servers, not a cluster. Have you asked youself why ibm and oracle and hp, releases unix servers with only 32 or 64 cpus, whilst linux servers have 2048 cpus? A hint: one are clusters (disguised) and the other are smp (one big fat server).

It is much more difficult to build a big smp server with32-64 cpus, than a cluster. It takes decades of research and development, to build scalable smp servers.

Linux also has smp servers, the biggest on the linux market has 8 cpus. There are no 16 cpu linux smp servers for sale, because linux can not scale to 16 cpus, smp wise. But there are 2048 cpus linux servers, disguised clusters they are. Can you show a link to a 16 cpu linux server that is smp? No. For a reason, linux can not scale smp wise.

Kebabbert
Bronze badge

Re: Fastest?

Bill neal,

Maybe you missed it, but zec12 is much slower than a decent x86 cpu, which i have explained earlier with links to mainframe experts. In general, one mips equals 4mhz of x86 cpu (according to someone who ported linux to ibm mainframes), in other words 50.000 mips equals 200.000mhz. But a x86 cpu that has 10 cores, running at 3ghz, has in total 30ghz. Thus, you need only a few x86 cpus to match the biggest ibm mainframe with 24 zec12 cpus.

Why do you think ibm never releases mainframe benchmarks? If they were faster than x86, ibm would have released loads of benches. Just look at all power7 benches, loads of them! There are no mainframe specint benches, for a reason: they are dog slow. Show me any ibm mainframe benchmark!

Instead mainframes have good io, but that is because they have loads of io help cpus. If you had that many io help cpus on a 8-socket x86 server, they would have better io than mainframes.

Kebabbert
Bronze badge

Ibm marketing...

That must be one of the weirdest arguments in existence. Just because ibm has faster cores, then ibm has faster cpus? You dont see the error in this ibm reasoning? Is oracle claiming they have fastest cpus or the fastest cores? This is pure marketing from ibm, and its supporters.

Let me rephrase it this way: power6 used 5ghz to achieve a tenth of the work as one sparc t2 at 1.6ghz on siebel v8 benchmarks. Ibm needed 56ghz of cpu power, and sun used 6ghz to reach equal performance in official siebel v8 benches. Ibm used 10x more ghz than sun, this must mean that sun was 10x faster, right? Does this sound logical? If we look ghz wise, then sun had a clear lead over power. Does this mean sun had in general, a faster cpu? No. It was faster on some benches.

You need to look at the entire cpu. Not ghz to ghz, nor core to core. Oracle claims they have faster cpus, and this is true. If ibm claims they have faster cores, it is also true. But nobody can buy cores, they can only buy cpus.

Kebabbert
Bronze badge

Re: Fragmentation

>>The reason he hasn't needed to compile the kernel is that he doesn't have the option.

Maybe he doesnt have a need of compiling the kernel? Maybe you can use Mac OS X without needing to recompile it every time you want to tailor it? For instance, on Solaris you can change the task scheduler on the fly, without recompiling nor rebooting. I consider it as a good thing. You dont do recompiles on Solaris either, no need to because it is easily configurable.

Kebabbert
Bronze badge

"Preserving the data"? Data corruption!

The backblaze company are using normal Linux filesystems, is it XFS?, and as research has shown, those filesystems are susceptible to data corruption. There is always a very small risk of data corruption, but if you have very large amounts of data, you will surely get data corruption. And 180TB is bound to have data corruption. The more data, the more problems.

Research about Linux filesystems being unsafe:

http://www.zdnet.com/blog/storage/how-microsoft-puts-your-data-at-risk/169

Large amounts of data always face data corruption, says Amazon engineer:

http://perspectives.mvdirona.com/2012/02/26/ObservationsOnErrorsCorrectionsTrustOfDependentSystems.aspx

Research shows that ZFS is indeed safe, and protects your data:

http://research.cs.wisc.edu/wind/Publications/zfs-corruption-fast10.pdf

Page: