The value of big data versus confirmaton bias

The first thing is a massive Yaaayyyy! And possibly Hurrah, hurray, hurrah. We have released. Doctor’s surgeries all across the land are getting their beautifully designed boxes, or their download invitations, and installing OUR software on their systems. Ian and Mr Grumpy are busy de-bugging as fast as they can, ready to release an update straight away, but marketing and sales are hap hap happy. Gavin and David are cheerful for the first time in weeks.

Obviously one of the things that sells the software is the way you can analyse your data. So you can see hours of work and rotas in pretty little patterns. In theory it will even calculate the nurses and doctors rotas for you. You can connect it to your appointments system and see who works hardest and spends the most of prescriptions and so on and so forth et cetera et cetera et cetera. GandD’s cunning plan is to allow people to register their data (anonymously) and see how it compares with other people’s data

The question is then what do you do with the data. As I’ve mentioned earlier, quite frequently, there are people who change their mind because of information, and people who don’t. This may be for two reasons. One is that well-known problem (or achievement) of human psychology; that we prefer information that backs up the position that we hold. We look for facts that support our viewpoint, rather than facts that will disprove it. This is why the null hypothesis is such a fabulous idea and is more common in myth than in reality.

The other problem is that the data may be useless. Firstly, it may be data rather than information. Information is data which has been judged, and it takes a human to judge (or, of course, an omniscient deity). Given humanity’s penchant for partial information and prejudice, we would often prefer to pass our judgement elsewhere. Secondly, it may be useless if one can take no action based upon it. What is the use of information that cannot be used?

So maybe, it’s best to hide data. Leave it in its data buckets in the garage, unmined and ignored. After all, all that unused data will only clutter up the minds and computers of the people who have to look at it. It will distract them from the decisions that they wish to take.

Perhaps I should introduce the concept of small data. Instead of “just-in-time” information, “just enough” information. Let the rest of it sit around if people want to go fossicking through it because they have an idea, but otherwise, don’t encourage them to mess about with it. Concentrate on the things they want to do, such as where to go for lunch and how many times you can read the same detective story with a different title.

Leave a Reply