One of the novelties of our connected world is the amount of data that can be collected. This may be a worry when the collector is unknown or untrusted, but it can also be an opportunity if one has access to the data oneself — especially when one might have no other record available. At the very least, it gives an insight into the way larger datasets might be used.
I signed up to last.fm ten years ago today. From that moment, almost every time I listened to a piece of the music, the fact was recorded by the service (a function they call ‘scribbling’). There are some gaps: I didn’t use it assiduously in the first few months, and the service itself didn’t timestamp scrobbles until later in 2005 (my first timestamped scrabble was on 18 December 2005, but music played before that is still included in the overall statistics). Apart from that, any music played on my iPod or in iTunes was captured. Also, as other services and devices became available (such as the iPhone and Spotify), they also sent data to last.fm. However, I have no record of CDs played other than through iTunes, nor of music heard on YouTube.
So, I can say with a degree of certainty that in the last ten years, I have listened to over 134,000 tracks performed by over 5,000 different artists. I can also see which tracks, artists and albums I have heard most frequently.
There are some other things that I can do with the data, thanks to various tools developed using the last.fm API. All this is very satisfying for the side of me that treasures useless information and therefore does quite well at general knowledge quizzes. I can even compare myself with others, and I am sure that information about patterns of listening could be useful to the music industry more generally. (That is one inference that can be drawn from the purchase of last.fm by CBS in May 2007.)
But, on reflection now, all this just leaves me cold.
Last.fm cannot tell you why I was listening to La Traviata, Leonard Cohen and Dave Brubeck in the week ending 21 May 2006, any more than it can say why this week last year resounded to Goldfrapp, Maria Callas and Sidney Bechet. Nor can I. There might be an interesting story there, but it cannot be told without additional prompts (such as might be found in my emails or notebook, and possibly not even then).
That is the problem. The real story — real knowledge — is as much emotional as analytical. And data cannot give access to the emotional truth. Given enough data, we might be able to see that something happened at a particular moment in time, and even what else was going on, but it cannot tell the truth of that moment.
This is the real challenge for data analytics (whether of ‘big data’ or otherwise). The analysis itself may be flawed because of the necessary exclusion of unmanageable information (the human factor). Even if perfect, the way a piece of analysis is received is also unpredictable (another human factor). Some people may react badly to what the data appears to be telling them. Some may react well, but act inappropriately.
I think this imposes a burden on those engaged in data collection and analysis to work carefully on understanding its limitations and its potential impact. I have read many excited articles and heard many breathless presentations about the power of data to make our lives better. I have rarely heard anyone refer to the corresponding responsibility to be sure that things won’t be made worse. That requires emotional intelligence, which is a purely human capability.
Data on its own will solve nothing.