By the second quarter of 2011, we will see another wave of machine-to-machine interactive financial data become available directly from the U.S. government. In June 2011, approximately 8,700 companies will begin to file supplementary “xml” files accompanying their quarterly and annual financial statement filings with the Securities and Exchange Commission. XML files can be read directly by computers allowing instant absorption by the analysis programs -- institutional and individual – to evaluate public companies. At its most grandiose, it means that investors, litigants and policy makers will be able to examine and assess the official legal version of these filings as fast as Regulation FD (Fair Disclosure) will allow.
This latest technological jump in financial information transparency is the result of a number of years of work by the SEC. The process included having to develop a specific sub-dialect of xml called XBRL, a many year exercise to turn the free form reporting of the U.S. Securities Act into a workable codified set of data construction rules. Because it blends management statements, the legal requirements of speaking about both “Financial Statements” and “Safe Harbor” discussions, and numerical precision of form-like data enhanced by company unique extensions, it is the most complex attempt of this type to date.
Not that the SEC is a stranger to XML. It originally started the process by requiring the relatively simple Forms 3, 4 and 5 that report on company ownership to be submitted per an XML specification via an online form. Hundreds of thousands of these documents have been filed and machines capable of reading them are able to render who has what at the most complex companies transparent. The SEC is also working on the XFDL specification that will codify many more form types submitted to the SEC.
Prior to this the most complex government financial data collection exercise took place in the parallel universe of the U.S. Banking Act. The FFIEC had brought the reporting of FDIC Call Reports into a Central Data Repository (CDR) and pioneered a three tongued publishing methodology that delivered perfect synchronization among a human viewable HTML file and two computer readable data files, one in xml format and one in CSV format thus covering 99.99% of possible downstream analysis interfacing cases. It set a high standard for all direct government-to-public dissemination to follow.
Getting Ready for Prime Time
One of the true tests of a new idea is whether or not it still works when one shuts down all the “experimental versions” of the process. The U.S. government version of this is beginning this phase now. Being analysts who use filings data to assess companies as opposed to XBRL developers seeking to make a living by filing documents with the SEC, we decided the time has come to do a “production acceptance test” of things as we wait for June 2011 to arrive.
Test number one was to ignore all experimental filings data feeds from the SEC and ask the key question, “ Is it possible to find and catalog these documents using only the official EDGAR Accessions file?“ And our favorite follow up government accessibility and transparency question, “Is what it takes to do that sufficiently low hurdle that anyone can do it for free or near free?” This is a critical operational issue because in the end, if it doesn’t work via a truly publicly accessible librarian pathway it isn’t soup yet.
We are happy to report that the answer is yes and yes. One needs nothing more than the SEC Accessions Catalog file to generate a complete table of the URL pointers to every xml filing. It’s implementable as a lights out program and we plan to create a look up utility with the link pointers to all the xml support files that will be incorporated into our IRACorpFilings.com site. Each filing has a set of xml files that together constitute the XBRL submission supplement to the main filing document.
The SEC’s implementation of downstream transmission support does not presently have the CSV check file version of the data alongside the xml as is done by the FFIEC. For one thing that makes it slower to process back into an RDBMS but that’s just an inconvenience and not a show stopper. What bothers us on this one is that we would like to see the CSV check file accompanying the xml file set – or at least the main xml file with the blocked data elements in it -- because having two machine readable versions of the same output file from the evidentiary source will help immensely for downstream users who need to automate testing for internal consistency in the incoming reports. We recommend that SEC OID look into this as a production feature to come online hopefully by the end of 2011.
We did note that the earliest 1,503 companies from the first and second wave of these filings did something odd … to us anyway. They prefixed the filenames of their xml files with their stock ticker symbols, an identifier that is not an internally verifiable construct – CIK is the real U.S. Securities Act legal identifier and is already in the header of the filing. You have to look up the stock symbol using an external “private” source and we flagged it as something that will become a “human reader” issue later on. There will come a point when the filers reach beyond just the major exchange traded public companies to what we like to refer to at IRA as the remainder of affected SEC Registrants.
It’s not an issue for machine-to-machine reading by the way. Computer programs don’t read and any unique string of text constituting a valid filename is sufficient. The bottom line is that locating usable URL links to XBRL xml file sets in an SEC filing is not a make or break issue requiring any sort of global Legal Entity Identifier. The xml files accompanying each filing could be named “Fred” and can still be successfully targeted by any well programmed computer. The SEC Accession Catalog is dandy and we look forward to our program – and ones written by others -- reading out and data basing the links to xml from these filings as they continue to appear.
Next installment, we’ll talk about what’s in the files themselves and what we think about using them to do surveillance and assessment analytics. Once you know where the files are the next question is, “Can you do anything with them besides print them out?” The real value after all is in the distillation.