Join us for the San Francisco Net Tuesday on September 9:
Involver: How Nonprofits Can Create Video Campaigns for Social Networks.
The job here is to screen scrape the subsidiary pages on the EDGAR database and translate them into an open format that will be published to the web and can be inputted into the Prefuse visualization software or another API.
The SEC's EDGAR website theoretically has lists of subsidiaries for all publicly traded companies in the USA. They are listed as "Exhibit 21" on forms S-1, S-4, S-11, F-1, F-4, 10, and the annual report filed on Form 10-K. Here are two examples:
http://www.sec.gov/Archives/edgar/data/831001/000119312507038505/dex2101.htm
http://www.sec.gov/Archives/edgar/data/12927/000119312506040952/dex21.htm
They are not in a standardized format and are HTML or plain text. These would need to be screen scraped, parsed, saved to a database, made available in an open format (so that others could use the data) and then plugged into the visualization API.
Can this data be GPL'ed? We would like to use copyleft to assure that others using this data must keep it free as well. To get the data into prefuse we would have to convert it to either GraphML or this format.
Upon further investigation I found another potential source for this data, although it may not be as useful since it is hisoric:
ftp://ftp.sec.gov/edgar/Feed/
Comments
Data Sources
ftp://ftp.sec.gov/edgar/Feed/
Looks like a daily download, in XML.
I have been lurking the
I have been lurking the group for a few weeks now, after a suggestion
from the people at MetaVid lead me here. CorpWatch is one of the
winners of the NetSquared Mashup Challenge, and our proposal is to
gather information from the SEC's EDGAR database to create a
visualization of parent company/subsidiary relationships. Here is the
full proposal: tinyurl.com/2z5f4v
I am still in the phase of gathering information on the idea, I have had
a blessing from a few of the technologists at Google, but haven't
actually nailed down a chain of tasks that will be required to get the
job done. It seems like the tough part will be designing the screen
scrape, but hopefully if we do it right it can be run routinely so that
the data stays updated.
Any thoughts on the feasibility of such a project or suggestions for
ways to tackle it would be greatly appreciated.
__________________________
Submited by : Bebes