Friday, 23 November 2012

A pillar of data journalism: The Guardian

I have long been an avid reader of The Guardian (albeit in its online incarnation, except for Saturdays when you can't beat stretching out on a sofa with the oversized supplements), and have been impressed with their skills at using data to uncover hidden truths that are often concealed behind complexity.

Today, I've stumbled upon not one but two exemplery data visualisations from The Guardian which struck me not for their aesthetic value but for the simplicity of their message.

The first one relates to the EU budget and allows you to interact to select your country of origin to see how much you personally contribute to the EU budget (average for the country, not YOU personally!) and how much you receive. . I never realised how much cash Poland receives from the EU coffers!

The second is again financial data, but this time looking at average earnings in the UK. I was shocked to see that there only 4 local authority areas in which women out-earn men (funnily enough, one of them is mine!).

What these two visualisations prove is that the key to impact and insight is simplicity. To my mind, it is one of the hardest things to achieve in visualising data. I recall a diagram from Andy Kirk on his Visualising Data course in which he talks about the journey of the data/message getting from the visualisation to the brain of the viewer, and how that journey needs to be as direct and unimpeded as possible. Its something I strive for (though don't always achieve), and something which The Guardian has become very adept at.

Friday, 9 November 2012

Using Pinterest to inspire new ideas

There are so so many amazing data visualisation sources out there in the cyber world, be they blogs, portals, websites, or applications, that I find myself struggling to keep up with it all. Like most people I imagine, I save a lot of stuff in to my favourites, but that folder is starting to resemble my wardrobe: overflowing and impossible to find what I'm looking for! I have recently opened an account on Pinterest as I thought it may be easier to store those things that inspire me, that I can generate my own ideas from, and I'm finding it a real revelation. Unlike Favourites, you can store things based on an image/icon which is much more appropriate for data visualisation links. My own page is - do check it out. If I'm struggling with writer's block, I have a quick look on my Pinterest page and the block becomes unblocked! 

Thursday, 20 September 2012

Infographic #3 - an alternative visualisation of Infographic #2

Early on when I was looking at the pharmaceutical data that I used for my first two infographics, I knew I wanted to create some sort of network diagram. I realised early on though that Illustrator wouldn't cut it, and that there was a distinct lack of software that was free, relatively easy to learn, and fit for purpose. Had I have known about NodeXL during that period, the final infographic would have looked very different. In the space of two hours, I created a network diagram! I was so amazed! Its an open-source Excel template created by the Social Media Research Foundation, and it took me no more than about half an hour to work out how to use it. The network diagrams are customisable and you can alter colours, sizes, fonts, layout, and most things. There are a couple of drawbacks of course: I couldn't seem to add a legend (so added this in Paint), and also I couldn't label every vertex otherwise it would have been too crowded (though this is a drawback of network diagrams, not of NodeXL). I do think this infographic does a better job of conveying the inter-connectedness of conditions between the medications, whereas #2 I think shows more the sheer variety of conditions that are possible side-effects. What do you think?

Friday, 14 September 2012

Data viz of the day #2

Its been too long since my last post, and I can only blame a mixture of busyness and lack of inspiration. I think its only a matter of time before you cease to be awestruck by new emergences in a particular field of interest, and I have to confess that I have unknowingly developed a British sense of cynicism towards a lot of new data visualistations I've seen in the last few months. Maybe its saturation, maybe its lack of innovation in the field, who knows. The point is, I finally saw something today which really took my breath away. Its actually a series of visualisations by a Brazilian fellow called Icaro Doria who works for the Portuguese magazine Grande Reportagem. Essentially, Icaro has used data about each country to allocate data points to a coloured element within a flag to illustrate a particular variable, be it country exports or genital mutilation. The facts have been well chosen to really give a fascinating insight into the economic, political, social, and health situations in each country. Anyway, I beg you to take a look, these are phenomenal and such a great example of the power of visualisation to communicate data.

Thursday, 9 August 2012

The making of Infographic #2

I thought readers may be interested to know how my second infographic evolved, particularly as it changed quite considerably from my initial conception.

A few months back I attended a course by Andy Kirk on data visualisation, in which he highlighted the importance of conceiving your visualisation design, rather than just jumping head first into it. Having already familiarised myself with the data back in Infographic #1, the purpose of the second one was to really highlight the sheer variety of side effects that these five drugs alone could potentially cause. In my mind I already knew that I did not want to use any numbers, it would be kind of meaningless. I also knew that I wanted to show the circle of side-effects caused by one drug and the treatment of that side-effect by another. For example, Omeprazole is used in the "treatment and prevention of...NSAID-related ulcers"*. An NSAID is a non-steroidal anti-inflammatory drug, an example of which is Aspirin. Interestingly, BNF does not list ulcers as a side-effect per se (hence why it is not in the diagram) but chooses to use the less specific description of "gastro-intestinal irritation or gastro-intestinal haemorrhage"; more of a catch-all I suppose. So I had a few conceptions in my mind (and crucially had already ruled out some ideas), but visualisations need to sometimes be seen on screen to get a feel for whether it could work or not.

My first idea was as a table, with red pills showing side-effects, green pills showing conditions that are treated/prevented. The five drugs were the columns, the conditions the rows. I quickly realised that there were too many conditions for it to be readable. Perhaps with <20 conditions, this could have been do-able.

My next thought came to me in the middle of the night (they normally do), and I was thinking of venn diagrams but using body organs instead of circles, with conditions in one circle for treated by, then conditions in another that were side-effects, with the overlap being those conditions that are in both categories. On drawing it out on paper though I quickly discarded that idea: I couldn't figure out how to distinguish the five drugs.

Next up was a mindmap idea. This was the most time-consuming as I didn't have any network diagram software which would automatically pick up the connector line with the text/circle (as it does in Microsoft Word when you're creating a mindmap). So I fiddled around with this for ages, having created vector images of body organs (again something that took a LOT of time!!!) and then reluctantly gave up on this, as there were just too many lines. I really should have figured this out from the table I did!!! I knew the text would never be big enough to read, and it didn't have that "wow" factor yet.

My penultimate idea was to have the five drugs at the centre of the infographic each discerned by an oval (so I could make the middle look like a pill just to over-theme the whole thing!). This really started to look promising as I could represent side-effect as a red dot and treated by as a green dot, and it would allow multiple dots for one condition. The one problem was the alignment of the lines leading from the conditions, it was still too jumbled.

So it was then I thought of switching to a circle instead of an oval in order to overcome this, which is what you see in the final version. I then grouped conditions by the body system (seemed most logical to me!). Classifying the conditions itself was a little challenging given that I am not a medical practitioner and that Google can only take you so far, so some of them were my best estimate!

So crucially what lessons have I learned?

  • First and foremost, I've got to keep it 'rough and ready' until pretty much the final draft. What I mean is, I spent ages early on creating these nice vector images of the body organs but they ended up not even being a critical part of the infographic. The main thing should have been to get the outline down on Illustrator, and then the elaboration should come last. What was important was realising how many conditions I had to deal with, and then ruling out visualisations quickly based on that. 
  • Second, never underestimate the onerous task of data collection if the data is unstructured. The data here was from a website, so I had to get it all in an Excel sheet, with one condition per row. I faffed about it with it for ages, looking up each condition on Google. I did this for all of the top ten drugs, when I only use the top five. A bit of wasted time, but not disastrous. 
  • Third, I've learnt a lot more Adobe Illustrator tricks which I will use in the future. The best one for this was using a pie chart of 88 segments (how many conditions are listed on the infographic) to help align all of the lines. Using the alt-drag to copy shapes, and the eyedropper, and the various align and pathfinder tools certainly helped to speed things up a bit also. 

Now on to the next project.........

*From the BNF description of Omeprazole:

Tuesday, 7 August 2012

Infographic #2 - the side effects of the top five prescribed medications in England 2011

So leading on from my previous infographic about the scale and cost of England's prescriptions in 2011, my next infographic looks at the side effects of these drugs. This infographic has been, to put it mildly, a labour of love. It has taken me well over a month of an hour here and an hour there of conceiving it, drafting it, scrapping it, starting again, tweaking, redesigning..... you get the idea. The complexity came from the fact that I wanted to show the connection between the conditions that are treated/prevented by these drugs, but also the side effects, as there were many conditions that popped up on both lists. Showing a many-to-many relationship is not easy; at first I thought of a venn diagram type thing using body organs instead of circles (yes I like to over-complicate things), then I was thinking of a flow chart, but there were way too many lines, so then I settled on this circle which is the format you see now. I may well do another post showing the stages of this infographic at another time, as it has been a useful learning curve to me that others may find interesting and/or helpful.

Anyway, back to the subject matter at hand. There were a few things in particular that really concerned me about these data. First, as I've sort of mentioned before is that these drugs are primarily prescribed to tackle our most prevalent Western diseases: high cholesterol, high blood pressure, cardiovascular disease, hypothyroidism, conditions all largely caused by unhealthy lifestyles. To my amateur eye, it seems that many of these drugs are not only papering over the cracks of these conditions, but also they perversely allow patients to continue with their harmful lifestyles (see this article for a very worrying quote from a Omeprazole patient about how she could indulge her love of pastries after suffering with heartburn for many years!!). Second, I was particularly concerned at the sheer number of varied and sometimes gruesome side effects of these prescriptions. Admittedly, some are rare, but on reading some of the descriptions, I am not sure I would want to take such a risk (just Google rhabdomolysis or Stevens-Johnson Syndrome - warning it may turn your stomach!).

To me, it seems the only winners in all this are the food and pharmaceutical industries. As long as we keep spending vast amounts of money on alcohol, sugar, and processed foods (and lots of it), there were always be a pharmaceutical company ready with a magic pill to reduce the effects of consumption of these foods. Its a virtuous circle for them, and a vicious one for us.

Tuesday, 24 July 2012

Data viz of the day #1

I'm an avid reader of all things health-related, particularly when it comes to subjects of nutrition and natural health. So this little beauty caught my eye this morning after opening up a Dr Mercola newsletter which popped up in my inbox today. The newsletter contained a link to a recent study by the UK-based Alliance for Natural Health, which looked at the relative risk of death from a diverse set of hazards such as drowning, car accidents and being struck lightning. Crucially though, this bubble chart shows that adverse reactions to pharmaceuticals are 62,000 times more likely to kill you than food supplements. From a personal perspective though, that scuba diving bubble concerns me somewhat, being a scuba diver myself, eek! Anyway, I digress..... This information, while perhaps not a shock, sits uncomfortably for me as not only is there a relatively high societal risk of fatalities from pharmaceuticals, but also a high individual risk, simply because of the sheer volume of pharmaceuticals that are prescribed each year, which links neatly to my Infographic #1.

Do check out the ANH website for more details, its full of fascinating data.

Thursday, 19 July 2012

Infographic #1 - the scale and cost of England's pill-popping

Today I upload my very first infographic (well that which I am prepared to share anyway!). The infographic shows the scale and cost of prescriptions in England in 2011. I don't know how you will react to it (by all means, please do leave comments), but for me I was absolutely staggered, shocked and aghast at the sheer numbers of pills that we as a nation consume. There were 32 million prescriptions issued for aspirins last year which equates to approximately 1.078 billion tablets, sachets and suppositories. Remember, this is just England, and not all UK!

Our over-reliance on pharmaceuticals has long been something of concern to me, not least because I lost my grandfather a few years ago after he passed away from internal bleeding as a result of taking Aspirin, something he was prescribed to help thin his blood. He was a big bloke with a fondness for beer and large portions (especially meat, potatoes, pastry, etc.) and boy did he load the salt on to every meal he ate!!

Of course, whilst I do not for a second doubt that there are many people out there for whom prescribed drugs provide a great deal of much-needed relief to their condition, the top five prescription items in their own right really paint a picture of the mess we are in as a country.

The number one drug is Simvastatin, which is described as a lipid-regulating drug. It is used to treat high cholesterol and also for preventing cardiovascular "events" amongst at-risk patients. The NHS Choices website lists many modern-day vices as primary causes for high cholesterol, including an unhealthy diet, lack of exercise, obesity, drinking excessive amounts of alcohol, and smoking. Looking at the information on the remaining four drugs, it is almost certain that a large part of the conditions that are treated by these drugs are similarly lifestyle-related. Hypertension, peptic ulcers, hypothyroidism, and cardiovascular disease are undoubtedly symptomatic of unhealthy lifestyles.

Perhaps this would not be quite so alarming if we did not also consider the amount of taxpayer money that is paid to large pharmaceutical companies in combating these problems. Nearly £9 billion was spent on prescriptions last year by the NHS, that's £9,000,000,000 - a lot of zeros!!! I often think millions and billions are pretty abstract numbers and hard to really get a handle on, but when I looked at the debt levels of African countries on the Worldbank website I was amazed to find that our pill bill exceeded the debt of Somalia, Ethiopia and Zimbabwe combined. A sobering thought indeed.

Anyway, I intend on creating a few more infographics from these data as there is so much interesting information in there, and as you can probably tell, I am pretty passionate about this subject area. I do hope you like it. 

PS: I'm trying out Closr for the image hosting, just click on the full-screen icon to see it up close. I'm hoping to find something a little better, but this will have to do for now!

Tuesday, 3 July 2012

The tentative first steps

So this being my first blog post, I feel I should "lay out my stall" so to speak, so here goes.

The purpose of this blog really is to share some of my thoughts about the fascinating and multi-faceted world of data visualisation. I am a data analyst by trade, but often find myself frustrated by my own inability to succinctly and impactfully convey the most pertinent and poignant features of the data. I find myself getting lost among the many realities that face an analyst in the real world: deadlines, the demands of many stakeholders (often conflicting), software, my own ineptitudes, and of course, not being able to find that little golden nugget within the dataset, that one revelation that can really take your breath away, leading to astonishment and a real feeling of discovery. Those moments are rare, but they certainly make the pain of analysis and data visualisation worth it.

I hope to share a few of my own revelations, frustrations, and visualisations here, and also pay tribute to those who do this kind of thing with much more creativity and simplicity than I can ever hope to do! 

I am however feeling a little shy today, so rather than exposing a shoddy piece of my own, I figure I'm going to share with you my top 5 visualisations that are inspiring me at the moment. In no particular order....

1. "Breaking Down Google's 2011 Revenues", presented by Wordstream and developed by For me, this conveys a whole raft of information but is neatly segregated so as not to overwhelm the beholder. I think there's a nice use of colour here and a restrained use of imagery, which is rare! The cost per click for some of those keywords really highlights what a buoyant economy the world of search is.

2. 3D London Underground Station Maps, by Andrew Godwin. This is a fairly new creation, and only came to my attention about a week ago, but I really like this. I'm a big fan of maps anyway, and living in London this is particularly useful to work out which stations to change at during a journey. Its very 'stripped-down' as well, just simple lines and colour and is a doddle to navigate using your mouse.

3. "Life in Data" by Ben Willers. I had the good fortune to meet Ben on a recent data visualisation course in London led by the venerable Andy Kirk (check out his site Ben gave me his business card and shared a few neat tricks which I have since put to use (I had no idea you could copy and paste Excel charts into Adobe Illustrator!!). During a quiet moment one day, I re-found his business card and came upon his website. His Life In Data project for his MA Design really struck a chord with me, being a little obsessive about counting calories and keeping track of finances myself. I admire Ben's restraint in his work in using minimalist colour palettes (often sticking to just one or two colours for each piece) and very few words, just letting the data speak for itself. I can only imagine how dedicated he must have been to collect all that data!

4. "7 Billion: How Did We Get So Big So Fast" produced by Adam Cole, cinematography by Maggie Starbard. Andy Kirk put me on to this one. The thing I like about it is how abstract it first appears (depicting population by a glorified bar chart really), but how in doing so, it takes you so close to the issue of global population growth. The final shot of the cylinder full to the brim is a powerful closing image indeed.

5. "When Sea Levels Attack" by David McCandless. I can't possibly have my first list devoid of the man that started my fascination with data visualisation. I bought his book "Information is Beautiful" on the back of a passing recommendation of a management consultant and was instantly hooked. This, I thought, was how data should be presented as a standard, not as a novelty. If you've ever seen a private sector (or public sector for that matter) PowerPoint data-heavy presentation, you'll know that we still have a long way to go. Anyway, back to this particular piece. Its really nothing more than a bar chart, but boy is it powerful. The use of silhouettes to depict each city, and the dual scale of years and water level really pull you in forcing you to give this piece your full attention. This is probably one of my favourite McCandless visualisations and just proves that you don't need to bombard your visualisation with colour, pictures, words and charts. Its a lesson in simplicity which I am trying hard to internalise myself!

Finally, I'm working on a visualisation at the moment concerning the use of pharmaceuticals in England, and already the figures have left me shocked, both in terms of the number of pills we pop as a nation, and the bill that the government (and therefore us taxpayers) have to foot as a result. Another week or so I think, and then hopefully I can share it here.

The Data Curator (in training)