Sunday, May 1, 2016

Articles on ICIJ's Panama Papers and Ramifications (5/1/16)

Introduction:  The following Wikipedia entries may offer updated information from time to time:
  • Wikipedia entry on Panama Papers, here.
  • Wikipedia list of people named in Panama Papers, here.
In addition, this searchable list from the Sunday Times might be worth consulting from time to time.  Josh Boswell, Tom Wills, Andrew Rininsland, Panama papers: the names: Search our database of 37,000 names linked to Mossack Fonseca companies in the tax haven of Panama (Sunday Times 4/10/16), here.  The linked page offers at the bottom a downloadable zip file with the data, here, which includes a csv file which is apparently 102.54 MB in size (presumably this could be imported into an MS Excel file, although I have not yet done that) and a "README.TXT" file to explain certain matters about the data.  Apparently this file lists the companies and directors, shareholders, and legal agents for the companies.

Frederik Obermaier, Bastian Obermayer, Vanessa Wormer and Wolfgang Jaschensky, About the Panama Papers (Süddeutsche Zeitung), here.  A great report (with links) from the newspaper and reporters that originally obtained the data.

Coming Soon: ICIJ to Release Panama Papers Offshore Companies Data (ICIJ 4/26/16), here.
The International Consortium of Investigative Journalists will release on May 9 a searchable database with information on more than 200,000 offshore entities that are part of the Panama Papers investigation. 
The database will likely be the largest ever release of secret offshore companies and the people behind them. 
* * * * 
While the database opens up a world that has never been revealed on such a massive scale, the application will not be a “data dump” of the original documents – it will be a careful release of basic corporate information . 
ICIJ won’t release personal data en masse; the database will not include records of bank accounts and financial transactions, emails and other correspondence, passports and telephone numbers. The selected and limited information is being published in the public interest. 
Meanwhile ICIJ, the German newspaper Süddeutsche Zeitung which received the leak, and other global media partners, including several new outlets in countries where ICIJ has not been able to report, will continue to investigate and publish stories in the weeks and months to come.
Meta S. Brown, Why Panama Papers Journalists Use Graph Databases (Forbes 4/30/16), here.

All large organizations, and a whole lot of small ones, use databases. These tools keep names, dates, numbers and other tidbits of information neatly in order, organized into tables by common structure and function, each one laid out in columns and rows that define a single proper place for every little fact. 
These same organizations also possess a lot of complex data, such as contracts, email, photographs and many other forms of information that just can’t be neatly organized into uniform columns and rows. Since this stuff is hard to organize, it often remains unorganized, making it hard to find information when it’s needed. 
When investigative journalists set out to understand the implications of the Panama Papers, an enormous set of documents leaked from the Panama-based law firm Mossack Fonseca, organization was a primary issue. The leak presented them with a wealth of information, millions of documents, but no guide to structure. Organizing the information, making it searchable, identifying connections among the documents and the people, companies and other facts within them was all up to the journalists. 
The International Consortium of Investigative Journalists (ICIJ), a network of journalists from more than 65 countries, foresaw this situation. In anticipation of massive technology-enabled information leaks, ICIJ prepared by developing expertise and resources for data journalism on a grand scale. No one news organization has the resources to fully investigate such a large information leak, but ICIJ can provide technical assistance and coordinate fact-finding by journalists around the world.
Mar Cabra, head of ICIJ’s Data and Research Unit, understands the limitations of the databases used in most business applications, which are known as “relational databases.” They’re not designed for management of lengthy documents, or for the labyrinthine relationships that connect them. For this data journalism effort, she chose another type of database, one that was designed as a natural match for the sort of complex data in the Panama Papers, and well-suited to facilitating journalists’ research. This type of database is called a “graph database,” and it is much like a gigantic diagram of documents and relationships among them. 
Graph databases are a good match for applications that call for finding related documents quickly, especially when you need only a few documents at a time, and don’t need to use exact calculation methods like classical statistics. The Panama Papers application, facilitating painstaking journalistic research, meets those criteria. So would a dating application, which calls for quickly presenting an individual with several potential matches. Serving product recommendations for online retail is also a good match for a graph database.
JAT Note: I think the author is right that relational databases are not good for lengthy documents -- except when they are.  I have worked with very large databases that could handle full text searches as well as having fields as in traditional less complex databases.  For all databases, a user must understand what they do, their strengths and limitations and then manipulate the data with those in mind.  I started building databases for my practice and other purposes in the 1990s.  I started with DBase III and then moved to Microsoft Access where I built an office practice database and a litigation database.  Then for larger databases (principally litigation), I went to third party providers starting first with Concordance around 2005.  All databases more or less have the same techniques but implement them in different ways in terms of user-friendliness and such.  But, databases make the world hum, sort of.

No comments:

Post a Comment

Please make sure that your comment is relevant to the blog entry. For those regular commenters on the blog who otherwise do not want to identify by name, readers would find it helpful if you would choose a unique anonymous indentifier other than just Anonymous. This will help readers identify other comments from a trusted source, so to speak.