For a while now, I have been talking publicly about the intersection between law, technology & finance – something I like to call 'Fin (Legal) Tech.' In the world of Fin Legal Tech, legal data is both an input to and an output from other information systems. Many of the data-driven insights that organizations seek lie in combining legal, financial and other data.  

One system which contains a large amount of legal and financial data is the SEC’s EDGAR database. The SEC’s EDGAR database contains terabytes of documents and data, including press releases, annual corporate filings, executive employment agreements, and investment company holdings. While EDGAR has existed for over twenty years, it’s been difficult for scholars and commercial providers to conduct or reproduce analysis based on EDGAR data. Often, researchers and commercial providers spend a lot of time and money developing and redeveloping code to retrieve and parse EDGAR data, with no common bottom-up framework.

Today we are happy to announce that our paper “OpenEDGAR: Open Source Software for SEC EDGAR Analysis” was published in MIT Computational Law Report. OpenEDGAR changes the way people interact with the EDGAR system. OpenEDGAR is an open-source Python framework that allows researchers and developers working with SEC data to share the costs and benefits of core functionality. In the same way that open source has contributed to the development of natural language processing (NLP) and machine learning (ML) resources, OpenEDGAR empowers researchers to find and develop answers to their questions.

PAPER ABSTRACT:  

OpenEDGAR is an open-source Python framework designed to rapidly construct research databases based on the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system operated by the US Securities and Exchange Commission (SEC). OpenEDGAR is built on the Django application framework, supports distributed compute across one or more servers, and includes functionality to (i) retrieve and parse index and filing data from EDGAR, (ii) build tables for key metadata like form type and filer, (iii) retrieve, parse, and update CIK to ticker and industry mappings, (iv) extract content and metadata from filing documents, and (v) search filing document contents. OpenEDGAR is designed for use in both academic research and industrial applications and is distributed under an Open Source License.

ABOUT THE AUTHORS:

Michael Bommarito was the Co-Founder of LexPredict, which was acquired by Elevate Services in 2018. At Elevate, Michael is the V.P. of Data Products & Innovation. Michael founds, builds, operates, consults for, invests in, and advises businesses in legal and financial services, tech, and logistics. His experience spans strategy, technology, business, and operations, ranging from top Am Law firms and $B+ AUM investment firms to idea-stage startups. Michael is a Fellow at Stanford CodeX.  

Daniel Martin Katz is a Professor of Law at Illinois Tech - Chicago Kent College of Law and Director of The Law Lab at Illinois Tech. He is also the Academic Director at the Bucerius Center for Legal Technology & Data Science and Affiliated Faculty at Stanford CodeX. Dan was a Co-Founder of LexPredict, which was acquired by Elevate Services in 2018. At Elevate, Dan is the V.P. of Data Science & Innovation.  

Eric Detterman was a partner and V.P. and Global Head of Products and Solution Engineering at LexPredict, which was acquired by Elevate Services in 2018. At Elevate Services, Eric is the V.P. of Data Engineering & Solutions. Eric’s past successes include creating, leading, and operating high-performing globally distributed software development teams that launch and bring to market innovative technology solutions. Eric has founded numerous technology companies with successful exits. He is most interested in projects in machine learning, technology, finance and law.