Big data is still one of those concepts that baffle me. How big does the data have to be? Where does it come from? And how can nonprofits and public libraries actually make use of it? At SOCAP13, a social enterprise conference in San Francisco, I attended two back-to-back panel presentations on big data with some nonprofit representation. Hearing directly from those working in big data for good gave some context around what's actually happening. Still, though, I had some questions around the definition of big data — and it seemed like the presenters did too.
Data and Philanthropy
The first session, Big Data and Philanthropy, featured two interesting projects that can greatly benefit nonprofits. MapLight is a nonprofit, nonpartisan research organization that uses data to show money's influence on politics. The organization provides tools that show campaign contributions, legislative votes, and patterns of influence. The tools aggregate data from a few different sources: The Center for Responsive Politics, GovTrack.us, Follow The Money, and the Wisconsin Democracy Campaign. The data is open to anybody or any organization with an interest or mission in government transparency and politics.
Another presenter, Planet Labs, is a new start-up based in San Francisco. This isn't your average tech company, however. Founded by three former NASA astronauts, Planet Labs is building and running satellites called Doves and by next year, they plan to have 28 of them sent into space. The Dove satellites take high resolution, constant images of the Earth. According to the founders, you'll be able to count trees or study mountaintops, but you won't be able to see people. Planet Labs is hoping that nonprofits as well as government agencies and academic institutions focused on environmental issues will take advantage of this constantly updated visual and geographic data.
Data for Health
Another session, "How Big Data Is Enabling Innovative Approaches to Improving Health and Wellness," featured speakers involved in a variety of data projects. One of the sites discussed, CHNA.org (Community Health Needs Assessment), is collaboration between Kaiser Permanente, Institute for People, Place & Possibility (IP3), and the Centers for Disease Control and Prevention. This free web tool is designed to help nonprofits as well as local health departments, financial institutions, and hospitals better understand the needs and assets of their communities.
IP3 is a nonprofit social venture founded by Center for Applied Research and Environmental Systems (CARES), a center at the University of Missouri that makes public data more accessible by turning it into visualizations. CHNA is part of a larger project called Community Commons, an interactive mapping, networking, and learning tool.
During the session, Dr. Chris Fulcher of CARES discussed taking a "prosumer" approach, in which people are contributing to their own community's data. The Community Commons encourages nonprofits and other health organizations to actively contribute data by providing free webinars on how to use the tools as well as various ways to submit data and stories about their communities.
Privacy and Definitions
In conversations about data, privacy is something that always comes up. Who owns the data? How is it being protected? Is it being monetized? Unfortunately, the speakers didn't really have a definitive answer to this (to be fair, the sessions were only an hour long!). It took a bit more digging to understand how these groups are actually protecting data and ensuring privacy. In the case of Planet Labs, they state that their photos only show topography and vegetation, but not people, cars, or houses. With the Community Commons tools, people can choose to keep their maps and data private or share them with others.
Additionally, the actual size of the data came into question. Is a community-based data project actually considered "big" data? Wikipedia defines big data as "a collection of data set that is so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications." During the health panel, the speakers seemed to agree that the CHNA and Community Commons projects weren't in fact big data projects, but data subsets that could contribute to larger projects. Interestingly, one speaker referred to the data as "better data."
Despite this definition confusion, learning about these projects and hearing first-hand from their creators was an excellent way to see data for good in action. I'm excited to see how these projects grow — and what impact they'll have.