Stripping out old records from primary databases could pay big bucks in terms of reclaimed disk capacity, faster database operations and backups. Clearpace has technology to do this and archive the extracted records in de-duplicated form on cheap SATA arrays. UK-based HP reseller 2e2 will resell and use Clearpace's NParchive …
Nice concept, but...
How do I then archive it off for "reasonable" availability any better than I do now? A good DBA has already taken measures to ensure critical production data is available, and production database access is fast. What if you're in the situation where you're production environment generates 30 million rows per day, that you need to archive off nightly (like my company...) and keep available for 2 years? How does this system improve that? It sounds to me like a very good idea, but if your company has DBAs that are worth a spit, you don't need it. Everything it does is BETTER done by good management of partitions, diskspace, and good backup/archive procedures. Once the procedure is regular, it can be scripted out, like everything else....
Do these idiots really think they have invented near-line storage?
I'm beginning to think that all the PaperEngineers[tm] that came out of the WWW boom are starting to gain a little seniority ... with predictable results.
On the bright side, my generation will be able to earn enough to retire properly, by fixing the problems the kids are creating. Understanding basics is important, regardless of the technology you are trying to deal/cope with.
The current kids, on the other hand, are going to find themselves in serious trouble.
What come around, goes around. Hint: Buzzwords don't matter. Working systems do.
small DBs = faster OMG! NEWS! not
nice ad. Can you have some news please?
Smaller != Faster, necessarily
A well-indexed database with all data access performed via appropriate indexes performs as well with 1 million rows as with 100 million rows. The index will still have around 3 levels, and will be largely (or totally) cached in memory. Thus each data lookup still performs 4 data reads (3 index, 1 data) for each record accessed.
The only ones where smaller = faster are ones with reporting-type characteristics where you are looking at all the data anyway. These will not be speeded up by archiving half your data off to external, slow disk and then accessed in its entirety anyway.
I've been trying to explain that over here at work (large UK bank) and it isn't sinking in. Yes, storage costs will be reduced by using offline storage, but performance won't typically be any better.
Size, not speed
As I understand this tech, it isn't really about sp33ding up your DB's, but reducing occupied space on the DB, while keeping all data accessible; any "idle" records are transferred to the slower, compressed partition (tablespace?) while those who are frequently used/modified are kept on the main, uncompressed one.
Of course, the compressed data will be slower to retrieve, as it requires decompression. So this solution doesn't give any speed gains, but it will save on storage requirements, especially on those who are required to keep stale data online for a large amount of time, like SOX-bound organizations.
Performance is gravy
I've seen this technology from Clearpace so I think it's worth reiterating that the first comment hit the nail on the head. The real drivers for using this stuff are around storage reduction (about 40:1 with Oracle data) and making old, infrequently accessed data much simpler and cheaper to manage. These guys aren't trying to be a better database, they're trying (pretty successfully from what I've seen) to be a better and simpler long term data store for cold data...improved performance in production is gravy.
This is already in DB2
DB2 has partitioned tablespaces. You can then partition your tables on ex. insert time, like a partition for each month. Then you keep the active partition down to a month and can deatch and do whatever with the other months.
Those tablespaces can be running on whatever disks you like, For performance reasons you keep your primary indexes and the active month and the next month waiting onto the fastest tablespaces. So when data becomes some month old and less frequently used, you can mirror it to slower tablespaces, and detach the old and attach the new. Since all the writing is on the active month you can do this mirroring online.
So why do anyone need this Clearpace solution then, does Oracle not have partitioned tablespaces?
Maybe it is all about educating DBAs.