back to article Unicode bloat blights SAP upgrades

A relatively small change to the way SAP represented characters four years ago is threatening to complicate upgrades to the latest edition of its software. SAP's implementation of Unicode - now widely adopted - has massively expanded the amount of data held in the databases underpinning many customers' SAP systems, increasing …

COMMENTS

This topic is closed for new posts.
  1. Anonymous Coward
    Anonymous Coward

    Four bytes?

    I wonder why they need four bytes? Unicode only requires two bytes. I recall that there was a pre-Unicode standard that simply mashed all the know characters sets together and this required four bytes - perhaps that's what they've done?

  2. Rich
    Thumb Down

    IT overloading

    A terabyte disk costs a few hundred bucks. That's enough for 4000 unicode chars on everyone in the UK.

    It's hard not to include that this is another instance of the IT industry making a huge meal of a very simple problem.

  3. BlueGreen

    "Macro4 SAP product specialist Markus Fehr told The Reg..."

    "... that Unicode requires four bytes of memory per character"

    is he a unicode expert then? I thinks I don't believe that I'm not convinced at all.

    AFAIK most unicode (western) code points can be represented with two bytes (exceptions are Han & ilk I understand) and if most of your data is pre-unicode then it's presumably ascii which can be represented as UTF8 with no change in representation at all.

    And if "It's static or historical information" then it doesn't need changing at all at all.

    They couldn't possibly have gone for UTF32 without considering the implications, could they?

  4. Steve Bush

    utf8/32

    Oracle seems to have gone for UTF8 whereas SAP has gone for UTF32.

  5. This post has been deleted by its author

  6. Anonymous Coward
    Anonymous Coward

    as someone

    who has recently been looking into the windows api, unicode is a masssive headache all around. Well, actually it's pretty straightforwad as a concept, and as long as everything's using it, but any programs you have written that make silly assumptions, like how big a character is (strangely this comes up occasionally when trying to process a string), need to be re-written with this in mind.

    @rich, corporate storage bears no relation to real world storage. You have to take into account daily backups and transaction logs, possibly neeeding to be kept for years. Suddenly doubling, or quadrupling, the size of every character will have a massive financial impact.

  7. Lostweekend

    UTF-16

    "Macro4 SAP product specialist Markus Fehr told The Reg that Unicode requires four bytes of memory per character" ... which as a sound bite is technically correct re. UTF-32

    Trouble is, SAP uses UTF-16 on the application server layer and either UTF-8, CESU-8, or UTF-16 on the database layer.

    Perhaps the 'SAP product Specialist'' should read SAP's FAQs

  8. Markus Fehr

    Clarification on the impact of Unicode encoding for SAP upgrades

    Unicode can involve using up to 4 bytes for some characters, depending on the encoding scheme used. For SAP systems the encoding scheme in the database varies from vendor to vendor. For example, Oracle uses CESU-8 and MS SQL Server uses UTF-16.

    The key issue with upgrades to ERP 6.0 however is that all data in the database needs to be converted to Unicode - and this means additional downtime for the SAP system. Therefore a data archiving strategy is best implemented before an upgrade as it will reduce the overall volume of data and hence the costly conversion time.

    The fact that the database can be larger after the upgrade because of the Unicode encoding can lead to additional data management problems. Again data archiving could help here by reducing volumes.

    Aside from the Unicode issue, we are finding that SAP users generally can make their upgrade process easier by using archiving to reduce the size of their database.

  9. Supreme Guru of Everything
    Gates Horns

    SAP following the MS strategy of ever increasing hardware requirements?

    Bloat the software and force a hardware upgrade. Make $en$e to me.

This topic is closed for new posts.

Other stories you might like