Become a data journalist: The overview

Let’s start with baby steps. Think about how you can incorporate data in your story, irrespective of the subject. Data can always tell a convincing story.

To be a data journalist, you have to actually know what a data journalist does day in and day out.

There are three areas that a data journalist may work on:

  1. Data mining or data gathering: To actually collect the data to be worked on can be exhausting if you don’t know where or how to look for it. It is also like a little chicken and egg game, where we don’t know whether a journalistic question drives the journalist’s data search, or the data dump lands on his or her lap like a ‘lightning out of a clear blue sky’. The latter is rare, but that is what awoke editors and journalists across newsrooms in the world when WikiLeaks released the War Diaries in 2010.

    ‘Finding data’ can involve anything from having expert knowledge and contacts to being able to use computer assisted reporting skills or, for some, specific technical skills such as MySQL or Python to gather the data for you.

    -Paul Bradshaw ( How to be a data journalist, The Guardian, October 1, 2010 )

    Execution: Start simple. Just search online for available data in the subject you are working on. Write to authorities requesting for information that you know is meant to be public. You can also file a FOIA (Freedom of Information) or RTI (Right to Information) petition to a relevant public body. Don’t forget to talk to experts in the field, or even other journalists, who you know have written stories about the topic.


  2. Data analysis or making sense of the data set: Once you get the data, your clock starts ticking. Time is running out, so making sense of it under crazy deadlines can be a challenge. The most important for effective analysis of data is sound background knowledge. Read and research the subject as much as possible even before the data reaches you. There is no point even if you stumble upon a gold mine if you don’t know how gold looks, right? 

    The real question then is how a journalist can train their brains to look for patterns and associations under a deadline. After all, once you find data most journalists still have to clean it up so that it’s useful and that takes time. It’s also prone to accidental errors, the more time humans spend massaging it.

    – Pete Forde, Founder, BuzzData

    Execution: Learn MS Excel for starters. To make sense of a huge set of data, you need to first organize the data (a.k.a data cleaning) and make it workable. Remove duplicate rows (de-dup) and columns and merge table to create the perfect worksheet for you to play with. You can use MS Excel, R, a Relational Database Management System (RDBMS) such as MySQL/Postgres, or Python/Ruby/node.js, QGIS. There are also tools available to help data cleaning, eg. Tabula or OpenRefine.


  3. Present the data: Now that you’ve spent a good chunk of time understanding the data and see how they interact with each other, it is time you showed your readers what you discovered. Presenting the findings of a data analysis sometimes can be done using simple lists, charts, maps and graphs, or other more interactive ways such as timelines, and customized web applications built from scratch.

    Play around. If you’re good with a graphics package, try making the visualisation clearer through color and labelling. And always include a piece of text giving a link to the data and its source – because infographics tend to become separated from their original context as they make their way around the web.

    -Paul Bradshaw ( How to be a data journalist, The Guardian, October 1, 2010 )

    Execution: There are numerous tools available online to make data visualizations and infographics easier for reporters. Some of the more popular ones that newsrooms use are: Google Charts, Fusion Tables, Tableau, Timeline JS, StoryMaps JS, and Plotly. To build web applications, you need to learn a little coding; start with Javascript and HTML5, it will come in handy.

Useful links:


Why is the knowledge of handling data essential for journalists?

There is a promise in data and this is what excites newsrooms, making them look for a new type of reporter. Look at it this way: instead of hiring journalists to quickly fill pages and websites with low value content the use of data could create demand for interactive packages, where spending a week on solving one question is the only way to do it. This is a welcome change in many parts of the media.

                                                       – Mirko Lorentz, Deutche Welle

It is not true that number crunching is only for investment bankers, everybody needs at least some knowledge of it.

I once shared a desk with one of the star IT & technology reporters of the newspaper, and I was shocked when I saw him working on his travel claims on MS Excel, but doing the arithmetic with a calculator on the side. I was unsure if I wanted to intrude, but I decided that I wouldn’t be able to forgive myself if I didn’t teach him how to use the summation function on Excel. And yes, he was our star technology reporter!

If our best reporters used antiquated skills to find and write stories, imagine the number of A1 bylines, Pulitzers and impact on readers they would have with some computing knowledge.

Some journalists (very very few actually) might ask “How is data journalism going to help my reporting abilities?”


Journalists who mater this [data journalism and programming knowledge] will experience that building articles on facts and insights is a relief. Less guessing, less looking for quotes; instead, a journalist can build a strong position supported by data, and this can affect the role of journalism greatly. 

– Mirko Lorentz

This is exactly what some of the new age media houses want to achieve today. A good example for this is what The Atlantic Media does with their online business magazine, Quartz. Their reporters neither need to attribute every fact or quote to somebody, nor do they have to make their journalism dependent on traditional ‘sources’.

Data says a lot.

Leveraging electronic formats [of data] enables journalists to deal with large quantities of information quickly, evaluate data with depth and flexibility, and to share power with readers, giving them the ability to search and review information to suit their own interests. Readers can act as sources, and even become researchers, as when The Guardian shared thousands of legislator’s expense reports public and invited readers to review them.

– Meta S Brown, Contributor @ Forbes


Other than the most obvious reasons for why journalists must know how to handle data — to find and present better stories– here are a few more that may interest you.

  • Updating your skill set: If you are stagnating in your newsroom, and want that jump to a few levels above you, data mining, databases and a little coding will come a long way. Data journalists are also paid more, and that is because you are no longer restricted to newsrooms.
  • Data journalism is the future: In a conversation with Raju Narisetti, in 2011, I asked for advice about shifting from computer science to journalism, and he told me about data journalism. Today, journalists cannot run away from this term.
  • Getting rid of information asymmetry: Sometimes journalists run behind sources and beg them for quotes and attributions. When you know how to get the right data, you can turn in stories faster and look more credible too.
  • One dataset, many stories: As a journalist, sometimes you are lucky to stumble across a loyal well-placed source that can give you many scoops, but that is rare and unpredictable. However, an interesting dataset and the possibilities of various interpretations can be infinite.
  • The missing connections, associations and patterns: It is easier to see connections and patterns in data, rather than in blobs of text. Great truths have been uncovered by journalists, thanks to some form of data. Sometimes even a basic spreadsheet or phone logs can open up an entire world of deceit and foul play.

In my next post, I will talk about what you have to do to become a data journalist.



History of Data Journalism

Data journalism has largely evolved in the last two decades, and continues to evolve from newsroom to newsroom across the globe.

Before the term data journalism came into existence, journalists and editors used another term — Computer Assisted Reporting (CAR). The Data Journalism Handbook, First Edition, July 2012, describes this process as “the first organised, systematic approach to using computers to collect and analyze data to improve the news.”

The first time newsrooms used CAR was in 1952 when CBS tried to predict the result of the US presidential election.

A few decades later, in the 1970s, the term precision journalism was used to describe a type of news-gathering techniques that used the application of social and behavioral science research methods. This practice was encouraged to overcome some of the gaping holes in journalism then, namely dependence on press releases or institutional statements, bias towards sources with authority, and so on.

An increasing number of readers wanted more information than just text blobs, and newsroom editors wanted reporters to provide it efficiently. That was how the term data journalism was born.

While it is said that people used data journalism techniques as early as during the Han Dynasty, a more realistic example is The New York Tribune article in 1849, which had chart to show the number of lives that were being lost to cholera at the time.

Though with the above examples we know that some sort of journalism was practiced across newsrooms, Guardian and The New York Times, brought the term into prominence when Wikileaks, in 2010, released airstrike footage and over 700,000 confidential documents pertaining to US operations in Iraq and Afghanistan.

The documents called ‘Iraq War Logs’ added 15,000 previously unknown civilians to the US public death count.

Julian Assange, founder of Wikileaks, calls the quoting and sharing of source material and data behind the story as one of the basic ways in which data journalism can improve journalism. He calls this “scientific journalism.”

There are many versions for the term data journalism, but Paul Bradshaw, a renowned data journalist, author and professor at Birmingham City University, keeps it simple and defines data journalism as “a journalism done with data.”

He goes on to say that there are three different stages at which a journalist can incorporate data journalism into the traditional news cycle: using programing languages to gather or mine datasets and information; using software algorithms to find patterns or connections between the data in the documents or datasets; and lastly, to tell a complex story with engaging infographics and visualizations.

In my next post, we can look at how data journalism is useful for journalists.


Useful resources:

The Data Journalism Handbook

The art and science of data journalism by the Tow Center for Digital Journalism.

Scott Klein on the history of data journalism.

The curious journalist’s guide to data