Digital Research Tools for Investigative Reporters
By Friedrich Lindenberg
Tools for documents
Most information that investigative journalists handle comes in the form of text documents, like Word files, PDFs or scanned images.
- Storing and searching sets documents can be done withDocumentCloud, which is more appropriate than e.g. Dropbox.
- Getting text and tables out of PDFs is an ugly process, but Tabula,CometDocs ($) and ABBYY FineReader ($) make it possible.
- Exploring large sets of documents is done with tools like Overview,Jigsaw and Nuix ($).
- Crowd-sourcing the analysis of documents has worked for some topics, with tools such as CrowData and transcribable.
Tools for data in tables
- Analysis is best done in a spreadsheet program like Google Spreadsheets or Excel ($), but online tools like Statwing and J++ Benford can help find anomalies.
- Simple charts can be generated with DataWrapper ($), RAW, Tableau Public and Google Fusion Tables.
- Maps should be used in moderation, but CartoDB ($) and Google Fusion Tables can generate powerful visuals. For advanced analysis, useQGIS. MapStarter can map out statistical data, and has an excellent reading list of further tools.
- Data about networks can be visualized using Gephi, yED, NodeXL(for Excel) or Maltego ($).
- Sequences of events can be understood using timeline tools, such asTimeline.js and Storymap.js.
- Cleaning data is necessary when your information it too inconsistent for analysis. Use OpenRefine and Data Wrangler.
- Advanced statistical analysis is done using the programming languageR, or a graphical tools like RStudio.
Tools for data on the internet
- Scraping data from the web means extracting data from web sites. The easiest way is Google Spreadsheets (tutorial), browser plugins likeScraper and TableTools2.
- Advanced scraping for more complex web pages is possible usingimport.io, Kimono and OutWit Hub ($).
- Sharing files on the web can be done with SpiderOak and tarsnap. Avoid Dropbox and iCloud for security reasons.
- Whenever you work on the web, consider your digital security. StudySecurity in a Box to learn about tools that can help to protect your identity and data.
Investigating people and companies
- OpenCorporates freely publishes large amounts of company information from many countries. DueDil ($) and Arachnys ($) are commercial services with similar profiles.
- File a search request on the Investigative Dashboard, and use their list of company databases and government gazettes.
- If you’re investigating companies that are traded on US stock markets, the Securities and Exchange Commission has information in it’s EDGAR filings. See also CrocTail.
- If your target might use offshore tax havens, search their name on theICIJ OffshoreLeaks (British Virgin Islands) database.
- For any persons with political exposure, search the WikiLeaks cablesand influence tracking sites like LittleSis (US), Poderopedia (Chile),RISE (Romania), Siyazana (South Africa) or OpenInterests (EU) – full list.
Information about specific topics
- Court cases and the law can be important resources, often collected by legal information institutes (LIIs). BAILII (UK, IE), SAFLII (SA),KenyaLaw (KE), Cornell LII (US), HKLII (HK) can be free alternatives to commercial products like LexisNexis.
- Extractive industries (e.g. oil, minerals) are beginning to introduce rudimentary transparency. EITI provides little concrete data, but organizations like OpenOil and NRGI provide data.
- Development aid is covered by a lot of data, released via theInternational Aid Transparency Initiative and the OECD. Tools and databases are documented at the Open Development Toolkit and Aid Data.
- Government spending, procurement and budgets are great sources of evidence, visit OpenSpending, the Open Budgets Tracker, and Open Contracting for high-level info, sites like WorldBank Finances, USA Spending and Spend Network have more detailed info.
- Land grabbing and ownership are hard to track, but the Land Matrixand LandPortal collect some deals. Compare this to SpatialDimension‘s mining cadastres.
- United States reporters must also consult IRE’s Database Library for investigative journalists and enigma.io.
There are many public listings of datasets, such as Awesome Public Datasetsand the DataHub. Much of this data requires specialized processing, though, so investigatives will have to join forces with a technologist.
Connect with others
- School of Data is an online community for learning about using data for journalism and advocacy.
- NICAR-L is the mailing list of data journalists in the US, which carries lots of useful advice.
- The European Journalism Center and Open Knowledge offer a data-driven journalism mailing list for journalists across the globe.