Adobe reader has support for the use of SAPI for synthesised text output, provided the book/PDF exposes the textual data and grants the right (say, could that be what Adobe means by "Permission to read aloud"?). For publishers paranoid about text copying, we - the blind users - have a little war to wage against the publishers, whilst the screen reader vendors work with Adobe to make the reader work just well enough to be usable. Oh, the joys; the unadulterated pleasures of it all.
Speech synthesis, I think, is something you get used to. I'm blind and use it all the time, and I've heard pretty much all synthetic variations from the 30s onward. Perhaps, rather than using concatenated elements, you should look into formant-based synthesis by rule? It's more unnatural (more computer-sounding), but much more agreeably consistent. The inflection is really there, carried by punctuation, and the synthesisers do their best to make things sound smoothe and responsive. ScanSoft now owns most of the best commercial synthesisers. There's a heavy license tag on them, though, for use by licensees who, in the AT market, rub a bit off on the end-victim. Then there's Fonix, whose latest generation is a mix using samples and DSP for a minimal footprint - a runtime for Linux can be had at $30. Open Source is available, quite good in cases and mostly in the by-rule category. Festival, which uses diphones and sounds not bad-ish, was once truely open; it seems to have become dubiously licensed since. But ESpeak, FreeTTS and Flite are still in evidence, and have their approvers (flite being, in essence, a fast-performing festival - until recently, anyway).
Human-sounding? Try http://www.nextup.com/ for all that's best in synthesis for use by the easily-impressed great unwashed. Go on, surprise yourselves. You'll pay for your sin with large disk space requirements.
I've read Project Gutenberg (I *love* Project Gutenberg! Check them out at gutenberg.org ) from Shakespeare through JKJ through Crompton using TTS. I think I've been most impressed by ScanSoft Elloquence's rendition of As You Like It. It's just amazing. And beautiful. Synthesisers that make you concentrate too hard on their output (I.E., those not blessed with very intelligent exception dictionaries, grammatical processing and with huge gobs of diaphone data at high-quality rates using a less-than-average blessing of emphasis rules for a big bite of CPU) are just hideous to use for anything serious. These are, I think, what you really want in a narrator for your EBook though, and what humans not needing synthesis in an assistive market think of as somehow necessary. It isn't that you don't need a better synthesiser, it's just that you're already hard-put-on to get anything more human-sounding onto your desktop computer. Surprising though how many people can listen to and learn Elloquence (rule-based) in little more than a few minutes before understanding it flawlessly, clearly and consistently. Hmm. If you can take the stereotypical robot-sounding voice, you'll soon master it and love the privileges it brings you.