Swiss Windows data reduction startup balesio has introduced an appliance to suck the fat out of unstructured Windows files. These include Office, Excel, PowerPoint and Sharepoint files and other unstructured – meaning not-database – files in the Windows PC and server world. The software, which balesio announced last week, …
I read that as "liposuctions fat out of flies" and was very confused.
I assumed it was the start of lab tests for a new fat-reduction method, and couldn't work out why you'd start with flies...
Ah, well. Slightly more interesting than the actual content ;)
It's strange reading that "structured storage" format files (i.e. every version of Office documents for a very long time, at least as far back as the last versions of Office on Win3.1, 1994/5, say) are suddenly classed as "unstructured".
This is, of course, not to say that there isn't fat to be removed from them, but they are far from unstructured. Indeed, the reason they can be defatted is because of that structure. (And, let's face it, because many users aren't sufficiently interested in this stuff to turn off "Fast Save", which just appends deltas to the end of the document...)
..that it's based on the mainframe view of files. Also encountered in the VMS world. Basically these OS provide the means to identify the content/structure of a file. VMS for instance has attributes that can indicate things like "CR/LF text", "LF text", "Bucket database"(*). This isn't just like Windows' extension recognition. You actually can't load a text file into a text editor under VMS unless you change the attribute to indicate the correct file structure.
Under windows "ren *.doc *.xls" is perfectly valid. The equivalent under VMS would be tantamount to corrupting the file.
By that definition everything Windows does is 'unstructured' because as far as the OS is concerned every file is just raw binary data. It's up to the application processing it to know how to read it. If you want to get NotePad to try and open an Excel sheet you just have to force it to by specifying the extension manually. Doubtless you could code a VMS app to do that if you wanted but it would be perverse.
NB:I haven't used VMS myself but I did used to be involved in data recovery and I was responsible for writing the Files-11 extraction tool. That taught me that copying the files from a Windows machine to a VMS machine was not enough. They need to have their attributes fixed up once they get on the VMS machine to be of any use.
(*)Not sure about that one but VMS does include database functionality in the OS.
No, it is not
We read and optimize the unstructured files (e.g. a PPTX file, a PPT file or an image) in its binary form, we don't need the application.
The technology is able to look inside these unstructured files and does its content-aware optimization process within each single file completely independent on the storage resource. No need for the application.
>We read and optimize the unstructured files (e.g. a PPTX file, a PPT file or an image) in its binary form, we don't need the application
Well I never said the original application was needed. I just explained why the phrase 'unstructured files' might have been used. In this case I'm sort of right..but probably sort of not :)
My assumption was that the term was coming from VMS/Mainframe terminology - perhaps I'm wrong there. Still the basic point remains - it refers to treating the data as a binary file rather than trying to parse the contents according to a specific structure.
I'm a little surprised if it really just sees them as unstructured data though. That doesn't seem to fit in with the article. If you're going to remove bit depth you need to know what you're reading and that implies /something/ being able to read the structure. I would expect you either have your own structure following logic or else - like other companies - you have licensed Oracle's OutsideIn libraries.
So I think I know what the product is doing and how. It's now Chris' reply that doesn't seem to make sense :)
Not everyone will like this
Stored documents have manager bloat automatically removed, silly backgrounds on Excel charts, pointless audio in PowerPoint files, gone... is there a Plain English plug-in as well?
Have I misunderstood what this does?
What's wrong with "Save as text"? :-)
have an article about this last week?
Slightly better description of the process in this one, though.
The software is great
Assuming that this is using the Fileminimizer software (engine?) I have to say that the software is the dog's dangly bits.
We have Office 2000 in work and I started off using PPTMinimizer, I had to send them a stern e-mail criticising their claim of "Up to 96% reduction" in file sizes, as I had managed 99.6% :) It does this by dumbing down graphics resolutions to only that which can be viewed, getting rid of crops that PowerPoint retains and removing embedded applications caused by foolish use of copy & paste.
I must have a go at their latest Office Fileminizer for Excel and Word as their recent versions haven't worked so well with MSO 2000.
An appliance that's continuously wiping out...
... non-visible data/info from my asset library?
Wow, what a great idea...
A few weeks back
I was sent an Excel SS with a list of about 100 IP addresses. I was asked to identify all of the subnets they were from.
I opend the file in Open Office, copied the list into "sheet 2" and then proceeded to rearrange them all into subnets, using colours to distinguish the different subnets.
Once done, I saved the file as an XLS file and emailed it back.
It was 75% smaller than the original file.