back to article Data's democratisation: Because there's no doh in Type 0

There has been a slow but steady democratisation of business intelligence (BI) and data science over the years with Excel (and PowerPivot), through introduction of self-service BI and growth of R as the language of choice for statistics. For those from a traditional programming background, Python has become the analytical …

  1. allthecoolshortnamesweretaken

    Interesting, especially the link. Actually put me on the right track to solve a problem I have. (At least I think so; will have to think it through over the next few days.)

    What's puzzling me: there are six types of SCDs listed and described, but there is no type #5. Were there once seven types of SCDs and one was dropped? Inquiring minds, etc.

  2. Whitter
    Boffin

    Re. SCD link: Are those recommended usages?

    Questions only: I have not worked in this area myself.

    The "type 2" example seems to assume that Customer_Name is unique and fixed, else one could not traverse from the current record to a previous record unless one also adds a previous Customer_ID column, which would make it "not type 2". What did I miss here?

    As for the type 6 example, is that Current_Flag column really worth the extra column when the data is derivable from the End_Date column? Would a Date/Time test on End_Date be notably slower than a bool check on Current_Flag Y/N? Would that depend if the date was a string or a DateTime?

    1. Adam 52 Silver badge

      Re: Re. SCD link: Are those recommended usages?

      "The "type 2" example seems to assume that Customer_Name is unique and fixed"

      Yes. That's what they mean by "natural key". The trouble with natural keys is that they don't exist in the real world.

    2. Adam 52 Silver badge

      Re: Re. SCD link: Are those recommended usages?

      "As for the type 6 example, is that Current_Flag column really worth the extra column when the data is derivable from the End_Date column? Would a Date/Time test on End_Date be notably slower than a bool check on Current_Flag Y/N?"

      Yes. Consider a big table. Your date/time column is what, 10 bytes and a flag 1 bit. That's a huge anount of IO saved right there and IO is the real performance bottleneck when you're scanning multi-TB databases so your scan just got 100 times faster. Plus you can index the column, and in a column store it'll compress to almost nothing.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like