Data and its problems
Data is just that "data". One of the problems, as this article shows, is that we never know if the data is complete, or even how to complete it if it is not and of the importance that the completeness requires.
The intial dataset was incomplete and had the potential to become "unhelpfull" but after the inclusion of the aneasthetists data is suddenly became usefull again, almost like the bimodality of the cancer itself.
Data is whatever we want it to be depending upon what is or is not included and of how that information is interpreted.
I tried to defend a colleague this week who complained that she as overworked. Upon pulling out the data it appeared that she was doing less than half the work of previous colleagues, everyone was amazed and ready to start disciplinary proceeding.. I went further and interviewed her, what was not apparent in the data was the fact that contract procedures had changed and she was now in fact doing more work that her previous colleagues.... Our intial dataset did not include the contract changes, it would have been very difficult to model and I doubt that anyone would have realised the importance and yet it was fundamental to obtaining a usefull conclusion.
Data is just that, "data". It is the understanding of the data that is important not the data itself.
I found this to be a very interesting article, kudos El Reg.