The final step of the planning process is also the most important and crucial to successful execution of your project: determining what data you need and where you will find it. This step can be time-consuming and frustrating, but the effort you put in will pay itself back ten-fold when you find yourself sitting down to start data analysis. Accurate, validated, and comprehensive data is the cornerstone to any data-driven initiative. It is critical to prioritize reliability and integrity of the data in order to ensure the legitimacy of your findings. In most data-driven companies, the “80/20 Rule” applies to data projects: 80% of your work will be spent finding, retrieving, cleaning, and organizing your data, and only 20% spent on actual data analysis. So don’t be surprised if this process seems daunting, and don’t rush through it. In this section, you’ll find information on accessing Internal Data (both within your department and in others) as well as External Data (data owned by some outside agency/organization, and typically publicly available). Use the Process Flow Chart on the following page to choose which resource — the Data Sharing Agreement, the Open Data Portal, your department’s stored data, or publicly available data— is appropriate for each of your data sources.
In most cases, you’ll be working with your Program Data — data that is owned by your department and collected by or for your program. This data resides within your department, and is easily accessible through your department’s Data Coordinator, who is your first resource to seek out when you need help thinking of what data to source for your project or where to find it. Please email CHHS@osi.ca.gov for help with contacting your department’s data coordinator.
In a few cases, you may find that your department does not have enough data for you to proceed with data analysis. To ensure you have a sufficient amount of data to being your analysis, you are encouraged to look to other departments’ data assets and determine if they’d be appropriate for your project.
Your first step to finding data in other departments is to check the CHHS Open Data Portal, our database for all CHHS data that is publicly-available.
Accessing private data in other departments is dictated by the CHHS Data Sharing Agreement, a legal document that entitles any department to accessing another’s data assets through a Business Use Case Proposal. Only proceed with this section if you’ve (1) decided that some of the data you need is not already available through your department and (2) is NOT found on the Open Data Portal, then this is your next step.
Note: Read the Data De-Identification Guidelines (in Section 2, Part 1: Cleaning/De-Identifying your Dataset) before sharing any data from your department.
The goals of the Data Sharing Agreement are the following:
To get data via the Data Sharing Agreement, you must contact your department’s Data Coordinator and submit a Business Use Case Proposal; this ensures proper documentation of what data you need, why you need it, and your commitment to several requirements, such as preserving the shared dataset in the form it was given to you. For more detailed instructions, visit the Business Use Case instructions or view the FAQ.
In the past decade, public interest in big data and data-driven projects has skyrocketed. As a result, there is a wealth of data available for free that may help you contextualize your results, find baseline measurements, or contribute to your findings. This section showcases some of our favorite sources of publicly available data.
- USAFacts.org — A data-driven portrait of the American population, our government’s finances, and government’s impact on society that uses federal, state, and local data from over 70 sources.
- datacatalogs.org — DataCatalogs.org aims to be the most comprehensive list of open data catalogs in the world. It is curated by a group of leading open data experts from around the world - including representatives from local, regional and national governments, international organizations such as the World Bank, and numerous NGOs.
- HealthData.gov — Dedicated to making high value health data more accessible to entrepreneurs, researchers, and policy makers in the hopes of better health outcomes for all.
- LOGD Dataset Catalog — The Linking Open Government Data (LOGD) project investigates opening and linking government data using Semantic web technologies. We are translating government-related datasets into RDF, linking them to the Web of Data and providing demos and tutorials on mashing up and consuming linked government data.
- CIA World Fact Book — Provides information on the history, people, government, economy, geography, communications, transportation, military, and transnational issues for 267 world entities.
- openFDA — Makes it easier to get access to publicly available FDA data. FDA’s goal is to make it simple for an application, mobile device, web developer, or researcher to use data from the FDA.
- Census Reporter — A Knight News Challenge-funded project to make it easier for journalists to write stories using information from the U.S. Census bureau. Place profiles and comparison pages provide a friendly interface for navigating data, including visualizations for a more useful first look.
- CalEnviro Screen — A mapping tool that helps identify California communities that are most affected by many sources of pollution, and where people are often especially vulnerable to pollution’s effects.
- California Healthy Places Index — A tool to explore community conditions that predict life expectancy. It contains user-friendly mapping and data resources at the census tract level across California.
- CHHS Open Data Portal — Offers access to standardized data that can be easily retrieved, combined, downloaded, sorted, searched, analyzed, redistributed and re-used by individuals, business, researchers, journalists, developers, and government to process, trend, and innovate.