Feeds

back to article DOCX disaster recovery: How I rescued my wife from XM-HELL

What do you do when a critical Word document won’t open? Even in today’s world of versioned documents, it is entirely possible for corruption to squeak in and go unnoticed, wrecking your entire version history. But all is not lost. My wife had this happen to her; here’s how we solved it. Real world example In my case, Word …

COMMENTS

This topic is closed for new posts.

Page:

Proper version control

It does rather make the point that Proper Version Control (you know, those things with 3-letter names) would have no trouble here, as it's decoupled from the editor in question. I suspect it's probably not as good for diffs in Word etc, but you do at least have the history going back to commit #1.

8
0
Bronze badge

Re: Proper version control

Actually it could support diffs, as long as the tool can handle zipped files. Office 2007+ files are just zipped XML, and you can diff the XML alone quite easily.

2
0
Anonymous Coward

Re: Proper version control

I'm not sure a version control system would resolve the issue.

As I understand it, the problem is the editor is saving garbage to a file but the document looks OK as long as the document is not closed and reopened.

If your editor is writing garbage to a file or a version management system, you'll still need to recover any work between the last usable document and the current document, and assuming sufficient edits/time you just have a better organised mess.

If the XML file was parsed to check for problems after a save and a sensible reconcile between the saved and active document occured, this should solve the majority of the mismatched tags issues (assuming there isn't some common fault between the editor and file parser - I'm sure that would never happen.....)

4
1

Re: Proper version control

It wouldn't necessarily solve it, but it might help. The main issue is relying on a tool as both editor and version manager. If it screws up, you've lost your version history. That's an eggs-in-one-basket risk avoided by using an external tool.

If you have an external VCS you've at least got guaranteed access to everything in the history. Some of those might be corrupt, but there will be a known-good version. You can diff the last known-good against the first corrupt to see what changed. It may or may not be straightforward to port that change forward into the latest version.

Plus you can see what you're changing - if the editor decides to reduce the file size to zero bytes, diff will show you that before you commit.

The main headache being they don't always play nicely with binary files, but as mentioned there may be a plugin to support zipped files which would help in this case.

3
0
Silver badge
Trollface

Re: Proper version control

>but you do at least have the history going back to commit #1.

Unless you work where I do where we unnecessarily change version control systems/servers so often that it can be hard to look up history past the last six months. Import/Exporting history doesn't seem to be a priority. Oh well not my company.

0
0
Gold badge

Re: Proper version control

If we had used Dropbox instead of Sync for version control we wouldn't have version history to commit #1. It would have eventually wrapped as it jettisoned the older versions.

I agree that a proper version control system is a really good idea...but very few people have them.

0
0
Bronze badge

Re: Proper version control

Version control only helps with completed works, it falls down when dealing with errors.

What is my irritation with the majority of office tools is the lack of proper journalling: Diff will tell you what the byte level differences are, the transaction journal will tell you that the differences are down to the section numbering changing.

0
0
Anonymous Coward

Re: Proper version control

"The wife was using an old version of LibreOffice Writer"

That's your problem right there....You get what you pay for.

2
16
Gold badge

Re: Proper version control

Prove it.

4
1
Silver badge

WTF

Odd - I've been using WordPerfect for years (like 20 years now I think - I started under DOS and VMS) and I've NEVER had a file become unreadable in all this time.

Am I just lucky, or is MS Office simply the application from hell?

2
2
Bronze badge

Re: WTF

I've had MS in all incarnations except 2010 (don't have 2013) issue a message totally unrelated to IO or the actual task being done (eg I am trying to redefine a style, and it tells me I can't insert an image or somesuch), die, and the file you are working on disappearing as if it never existed. Happened rarely, no more frequently than once a year So I now close & copy doc every 2 or 3 pages or so

I remember opening WP 5.0 & 5.1 files in SPF-PC to fix some spectacularly garbled tag fest (usually my fault - badly coded macro)

1
0

Re: WTF

According to the article, MS Office didn't corrupt the file, LibreOffice did. MS Office just refused to open a corrupt file.

3
0
Silver badge

Done something similar

Back in 2003, I can't remember the full circumstances but I think I managed to hose X, so I was stuck at a command prompt with no GUI, and needing to know where my next lecture would be from an OpenOffice Calc file.

(Yes, Linux laptop. I was too poor to afford anything more than a Pentium II and no one in their right mind would touch Windows 98.)

I recall unzipping it and then doing a grep for the approximate text. I was then able to open a text editor on the file, look for the text, and found the information I was after.

2
0
Silver badge

Which is why...

I save my Word docs as RTF.

7
4

Re: Which is why...

How does that help? Is it somehow immune to errors?

My experience with editing RTF the hard way is that it is even less amenable to fixing than XML (not least because the tooling is far less developed).

3
1
Anonymous Coward

Re: Which is why...

"I save my Word docs as RTF."

At which point you've so lost much functionality, you might as well use Wordpad.

5
3
Anonymous Coward

Re: Which is why...

At which point you've so lost much functionality, you might as well use Wordpad.

Which for almost all the documents I see a word processor used for would be perfectly fine. RTF is actually a very nice format - easily parsed, human readable, compresses well and quite feature rich.

13
0
Silver badge

Re: Which is why...

RTF is actually a very nice format - easily parsed, human readable

My recollection of hand-fixing RTF files is that they were only readable in the sense that they consisted of printable ASCII characters. Understanding the RTF was another matter entirely. Maybe I'm subhuman.

The problem, I suspect, may be that the RTF emitted by Word suffers from the same lack of structure as the HTML emitted by Word. Editing a Word-generated HTML document isn't a pleasant experience.

11
0
Anonymous Coward

Re: Which is why...

I save my Word docs as RTF.

I save mine in ODF format, or (if I absolutely have to use an MS format) .doc. I tend to avoid the X formats wherever I can, and I thus never had the problems as described (or I've been lucky).

I have one single copy of MS Office in the whole office, and that's only for compatibility reasons - internally we switched to LibreOffice. That's the benefit of being small :).

9
1
Gold badge

Re: Which is why...

"I tend to avoid the X formats wherever I can, and I thus never had the problems as described (or I've been lucky)."

You've been lucky. The ODF formats are also "zipped XML" and so if the code is willing to emit an ill-formed XML document then it seems perfectly plausible that it would be willing to spit out bad ODF as readily as bad DOCX.

2
0
Gold badge

Re: Which is why...

Given the "how" this occured, I'm 100% positive that LibreOffice Writer 4.1 would have caused the same error (with the same fix) for ODF as it would have for OOXML. I'm less sure that many of the oMath or Excel errors in Word would have been the same in ODF as under OOXML, as they are most frequently issues with "order of tags" rather than "when the tags are committed". Thus it is theoretically possible that Word could write a oMath error to DOCX but not to ODF.

Either way, I'm now glad that both formats exist as they do; in a human-fixable fashion.

2
0
Anonymous Coward

Re: Which is why...

"At which point you've so lost much functionality, you might as well use Wordpad."

Or Google Apps / Libre Office. Microsoft Office is still miles ahead of the alternatives.

2
15
Gold badge

Re: Which is why...

"Microsoft Office is still miles ahead of the alternatives."

List the 'features' in Microsoft Office that I, personally, my family or my clients care about that are available in Microsoft Office that are not available in LibreOffice or Google Apps. Present a solid commercial rationale for why these features are worth the price delta on a per user basis.

Please include an analysis of the "value" of my data being made available to the NSA/GCHQ/etc on demand so that they can scan it in order to send innocents to jail and/or steal whatever innovations I may have to to give their own companies commercial advantage. Please include an exacting means by which I can ensure that closed source software - let alone American cloud-integrated stuff - is free of such snooping, should I choose not to avail myself of the "feature" of governmental integration.

If you cannot provide a credible analysis of exactly why and how Microsoft Office provides a better value than the competition, in real dollars and cents for features that I actually care about then I have only two conclusions to draw:

1) Your absolutist statements are false because they do not apply to everyone.

2) You are completely and utterly full of shit.

Please not that both conclusions are not mutually exclusive.

29
2
Anonymous Coward

Re: Which is why...

Miles ahead?

Maybe. But not necessarily in a direction I'd particularly like to go (aka "Yesterday, we were looking into the abyss. Today, we took a great step ahead.")

The ribbon UI still manages to confuse me after a couple of years of use, and today's wide-and-low 16:9 screens do not take kindly to having a HUGE ribbon bar at the top. Why, o why can't we at least move that to a column on the side, where screen real estate is not nearly as precious? I resort to having the ribbon auto-hide, but that is still not good UI design.

The rather obnoxious fixation on "persuading" users to store their stuff on Microsoft's servers (Office 2013 got really pushy about that) is also not to my liking. One year after Snowden, there should at least be an easy to find, global switch (controllable by group policy) to once and for all disable the cloud functionality should a user elect to keep a modicum of privacy.

Unlike freebies, where the user's data is the currency paid for the privilege of using the service, Office is a product the user pays for with their own money, so the business model would not require revenue from data mining.

6
0
Anonymous Coward

Re: Which is why...

"Microsoft Office is still miles ahead of the alternatives."

List the 'features' in Microsoft Office that I, personally, my family or my clients care about that are available in Microsoft Office that are not available in LibreOffice or Google Apps. Present a solid commercial rationale for why these features are worth the price delta on a per user basis.

IMHO none whatsoever, but Word does have ONE (1, uno) feature that I would dearly love to see in LibreOffice: shift-F5 cursor position replay. It means you can zip to another part of a document (using, for instance, Document Map, which LibreOffice's Navigator knocks into a cocked hat with consummate ease), do some editing and then use Shift-F5 until you're back where you came from. For work on larger docs it's very helpful.

However, as that is about the only feature I can recall that LibreOffice lacked vs Word after a good 20+ years of working on docs I reckon I can live with that (and I simply changed approach to compensate - I just open another window on the same document). Personally, I would love to see a Reveal Codes as it existed in Wordperfect.

I default to LibreOffice with pleasure. It certainly saves us money, but that's less an issue than that it saves me staff retraining (as its UI is stable, and resembles the Office 2003 layout, pre-rubbish, sorry, ribbon), works identical across platforms and doesn't expose us to license compliance risks. God knows how much time we must have saved not having to worry about licensing.

4
0
Bronze badge

Re: Which is why...

If RTF is "very nice", then nearly every other markup language ever invented must be at least "deliriously wonderful".

I save most of my writing as LaTeX. That also has the advantage of not being editable by Microsoft Word.

Personally, my feeling is that Microsoft Word is somewhat worse than OpenOffice / LibreOffice for editing other people's Word documents, or creating Word documents when I absolutely must; and unsuitable for any other purpose. (Powerpoint still had the edge over OO/LO Impress the last I checked, and for some kinds of presentations remains more suitable than the various LaTeX and HTML alternatives. Excel, on the other hand, I loathe with every fiber of my being.)

3
0
Anonymous Coward

Re: Which is why...

Powerpoint still had the edge over OO/LO Impress the last I checked, and for some kinds of presentations remains more suitable than the various LaTeX and HTML alternatives. Excel, on the other hand, I loathe with every fiber of my being

To be honest, I originally found Powerpoint more usable than OOs/LOs equivalent until its UI got destroyed or "ribbonised" which *seriously* got in the way of usability, together with the absolutely stupid load of gadgetry that MS tends to use to foul up any usable application (RIP, Visio).

However, it was at that time we switched to OSX, and Keynote is just *so* much better that we pretty much abandoned Powerpoint completely, not in the least because it somehow promotes austerity in slides, which only benefits the quality. Most of the times we use HaikuDecks or Powtoons instead, but if we have to work on slides internally, Keynote is what we use, and export to PDF later.

AFAIK there are no real issues with Excel nor LOs equivalent, but we're not a finance house so we don't use all the functions that may make a difference. We do, however, have a hangover from the past which stopped us early from using Excel: we work in multiple languages, and an early Excel spreadsheet in English would not work in, for instance, German because the formula were not tokenised (no, really, I'm serious, for instance "SUM" in English would fail in a German version of Excel which required that to be "SUMME"). If you're shaking your head by now, so did we, I still can't quite believe it. This got us into using OpenOffice pretty early, even before the days where LibreOffice was forked.

Resuming the original topic, I actually never had a format failure from OpenOffice other than when the first file format change appeared. As someone explained to me, I must have been lucky :)

0
1
Silver badge

Unusual error

By far the most common corruption I've found is in the form of truncation - people yanking out their USB sticks before the file is fully written. The ZIP container stores the index at the end of the file, so you can't even open it unless it's all there. After finding every so-called document and zip recovery utility quite useless, I just wrote my own one. It'll let you recover at least partial contents of the zip, hopefully including the document.xml from which raw, unformatted text can be easily extracted.

Then I just pass it back to the user and tell them to fix the layout and unmount properly next time.

https://birds-are-nice.me/programming/zipfilerecover.shtml - if anyone ever needs it.

12
0
Bronze badge

Re: Unusual error

Interesting tool. I suspect that this is the type of tool the author of the article was really in need of: namely one that could handle office document XML with errors in a sensible and user friendly way.

Suspect your tool will become even more useful as documents move into the cloud: a truncated network connection also plays havoc with the readability of office documents.

2
0
Silver badge

Re: Unusual error

That thing has nothing to do with XML at all. It just extracts truncated ZIP files. When it comes to actually making some sense of the half-a-document you get from it, you're on your own! In many cases, even recovering nothing but the textual content of a document is still valuable.

3
0
Bronze badge

Re: Unusual error @Suricou Raven

Sorry wasn't clear.

The tool you created (and helpfully linked to, thankyou) understands the Zip file format and so whilst it doesn't repair the zip file, it is a big step forward in recovering the contents of a damaged zip file.

I was suggesting that a tool that understood office XML could 'recover' an office document so that it could be loaded into say Word, leaving the user to make sense of what was recovered.

2
0

Re: Unusual error

Yes, USB "truncation" is probably much more common than XML Tag errors. But even worse than storing the index at the end of the ZIP is that the Word X format files store the body text "document.xml" at the end of the file. So very commonly you lose most if not all of the text content.

0
0
Bronze badge

With apologies to Ray Parker, Jr.

"When there's something strange in your closing tag, who you gonna call?"

"CodeBusters!"

7
1
Silver badge
Stop

Re: With apologies to Ray Parker, Jr.

That's deep in the uncanny valley of humour.

So bad it's not funny, but not quite bad enough to be funny for the badness itself.

7
0

What if the corruption had been more than just a missing tag?

A sensible policy would be Dropbox and a five-minute auto save. That way, you can go back through the file's history and get the last saved version.

Also, if your content is really important to you, use actual Microsoft Word. Yes, yes, I know. It doesn't run on Linux, it isn't free, it locks you in, etc. But it's bloody well tested and at the end of the day it's your content on the line.

2
13
Anonymous Coward

MS Word is not immune to this either

I have PhD students compiling documents with tables and images from all types of sources and very often Word throws a wobbly and loses something, refuses to open or stops working with some add-on from a third party reference manager.

MS Word is not perfect.

16
0
Anonymous Coward

Re: MS Word is not immune to this either

I have PhD students compiling documents with tables and images from all types of sources and very often Word throws a wobbly and loses something,

Then insist they use a real document formatting system like LaTeX, rather than a piss poor desktop publishing application.

22
0
Bronze badge
Holmes

Re: MS Word is not immune to this either

"or stops working with some add-on from a third party reference manager.

MS Word is not perfect."

I think the add-on might be the culprit here

1
1
Gold badge

Dropbox only keeps so many versions in history. What do you do if Word introduces the error on page 2, autosaves every minute, but you don't close word until page 32? Every single save in the dropbox history would have the error.

Remember: these errors can be introduced, but go unnoticed until you close the application and attempt to re-open the document.

Besides, if your last good version was "two pages of text" you are going to want that last 30 pages!

9
0

Culprit

The culprit is Microsoft and its historical Godzilla-eats-world approach to software. If the composition and finalization of documents in Word (or OO for that matter) followed a rational work flow scheme, these errors would be far fewer and fixes would be immensely easier. As is, you get software that assumes that it can do no wrong, and can do anything you need. Commonly the assumptions are wrong on both counts.

8
3
Silver badge

Re: Culprit

How does the approach taken to finalizing a document in MS Word guide the hand of the better, more conscientious people who wrote OfficeLibre Write, which is the actual culprit in *this* scenario? Are you saying that after all the fuss of "don't use crappy Microsoft, use our better alternative", the design thinking is identical?

3
1
Gold badge

Re: Culprit

Wow. Jesus shit-pickling Christ, will you please get your head out of your very biased ass? Where did I - in the article or the comments - says "don't use Microsoft, use LibreOffice?"

I said "this is a class of error that at a minimum Word and Writer can and do both cause. In my case, the error was caused by Writer, but anything that writes to a DOCX can theoretically cause it, here's how you fix it."

Saying "Word does this too" is not saying "LibreOffice is better". They are distinct concepts. Saying "Word does this too" is to reinforce that this is an issue that can occur with the document format, and that any application could theoretically cause it. Or a single-byte corruption error could cause it. Or $deity knows what else.

The point is that "which productivity suite is 'better'" has absolutely no place in the discussion at all. It doesn't matter. Since both Word and Writer have been proven to cause this class of error then knowing about the class of error and how to solve it are what matter.

Take your religious issues elsewhere.

15
0
Silver badge

The easy solution is to allow Windows to automatically install every last update as soon as they roll out. This way your computer is constantly restarting so you'll never get too far into writing your document :)

7
0
Bronze badge
Coat

RE: allow Windows to automatically install every last update

This way your computer is constantly crashing and restarting so you'll never get too far into writing your document :)

FTFY!!!

3
1

Dropbox keeps an unlimited number of versions for up to a month, and it keeps a completely unlimited number of versions if you pay a bit extra for "pack rat". I agree that an error could be introduced, but it's still a better safeguard than relying on a plain old filing system.

0
0

Re: Culprit

Although in your particular case, the bad formatting would appear to be caused by a bug in Libre Office, rather being a stand-alone issue relating to generic file I/O errors.

1
0
Silver badge

Re: Culprit

Take a breath, Trevor. I was responding to the post immediately before mine written by "Marshalltown ", not to you. I replied to his post but the indent seems not to have happened.

As for my "bias", it was induced by your article and I wasn't the only person to take away the message I did - suggesting to anyone not in complete turtle defense mode that perhaps your own prose is working against you.

Indeed, I've tried to write my questions to you in a neutral tone throughout and have been clear I'm trying to understand the issue, not pick on you - with perhaps the sole exception of one comment pointing out that not having documents open for weeks on end has been common wisdom trotted out in these pages for years in response to "the SA booted my machine and I lost days of work" posts. All that following good practice would have done was alert you to the issue sooner of course.

If you wanted to convey the same sort of neutral tone in your article it was, in my opinion, unwise to illustrate a problem you saw with NOT (MS Office) with examples drawn from the library of problems of the Big Bad Bugger on Campus. *That* is the source of this particular misunderstanding and it is entirely down to an editorial choice made by your good self.

As for my "religious issues", I already said I use OpenOffice myself. In fact I don't possess a copy of MS Office older than '97 for my personal use. I have to use MS Office at work.

1
0
Gold badge

Re: Culprit

With this lot, neutrality doesn't matter. If you aren't fellating Microsoft you're absolutely against them. There is a pack of absolutely rabid anti-open-source types that occupy the comments, and I'm sorry if I inaccurately lumped you in with them. I think it's fairly easy to understand why I did.

Really, however, it's this comment that does it: But why go to so much trouble trying to pin this (in the reader's mind's eye) on MS - by the headline which strongly suggests the problem will lie with yet another DOCX issue and by "the most famous example" which is still pretty obscure to be honest - when what you are really up against is an OfficeLibre Write bug?

10 points for style but minus a couple of hundred for mendacity.

You outright accuse of my lying by somehow attempting to "pin this" on Microsoft. What the fuck? The article in no way attempts to "pin this" on Microsoft. There's absolutely nothing in that article at all that says "Microsoft Word is bad" or "LibreOffice is better". I mention - in the article and in the comments - that Word and LibreOffice can both give rise to errors where you might care about this kind of fix...and I go to some length to discuss the different ways it can occur, with examples using each product.

It doesn't really get much more neutral than that. Yes, the error I personally experienced was with LibreOffice writer, but that is completely irrelevant; the error class can be caused by multiple products, and thus mentioning that - with examples - is in the public good.

Yet you come out and accuse me of lying to people and somehow trying to "blame" Microsoft. So yeah, you know what? You get lumped in with the batshit-crazy Anonymous Coward and LDS as "rabidly and irrationally pro-Microsoft", to the point where I can't - and won't - take anything you have to say seriously. There's no neutrality or objectivity present in what you said there, there's a massive assumption followed by an attack.

The majority of people who read this article didn't walk away with a "Trevor was trying to blame this on Microsoft" vibe in any way shape or form. Some folks, however, see monsters where none exist. I've no time, patience, or respect for them.

I don't feel the need to "imply" Microsoft - or anyone else - is at fault for things. If I think Microsoft fucked up, I say so openly. If I think LibreOffice is better and you should buy that, then I say so. There's no pussyfooting around.

Comments like Are you saying that after all the fuss of "don't use crappy Microsoft, use our better alternative" responding to an article in which I did not in any way shape or form recommend one product over the other would seem to indicate that you fall into the "you didn't fellate Microsoft, you're obviously an evil, open source economy destroying wretch" camp.

So no, sir, I don't accept your "neutral tone" argument. You waltzed in here an accused me of lying. When I said "bullshit", you doubled down. Maybe you aren't rabidly pro-Microsoft, but your presentation was in no way a "neutral tone". If you come in guns blazing, don't get all shocked and shaken if'n I fire back.

3
0
Bronze badge

Lot of work?

Seems like a lot of work.... First, I would have tried to open it on a Mac with Text Edit, and, if successful, converted to plain text, then back to and office document flavor of your choice. I can’t begin to count the times Text Edit worked the charm. If no success, then it would have been the lot of work route.

1
0

Page:

This topic is closed for new posts.