The best way to learn the craft of predictive analytics or even just preparing data is to find information you’re interested in and tell a story. And finding an interesting data set to share a story can be the most difficult part of creating engaging data visualization. Hopefully this list will help you with finding that interesting cool data.
Effective Clear Data visualization is a multi-step process.
1st you have to find reliable data, then you need to clean it, possibly join multiple tables that have conflicting date formatting, get it into the right format, and then uncover the story you will visualize.
I just joined this, pretty valuable. https://data.world/about/
Discover and share cool data, connect with interesting people, and work together to solve problems faster.
Yep. Amazon has some free data. AWS
AWS hosts a variety of public data sets that anyone can access for free.
Previously, large data sets such as the mapping of the Human Genome required hours or days to locate, download, customize, and analyze. Now, anyone can access these data sets via the AWS centralized data repository large dataset.
- infochimps.org or http://infochimps.org/search?query=weight is an easy search in general–it certainly has many health-related datasets
- Public Health Data Sites Index As garish as it is, http://www.bettycjung.net/Phdata.htm actually has a wide variety of health data links and is probably your best bet.
- WHO Statistical Information System (WHOSIS)
- Public Health Library Statistical/Data Resources
- Health Data Tools and Statistics
Government and political data
- Data.gov: This is the go-to resource for government-related data. It claims to have up to 400,000 data sets, both raw data and geo spatial, in a variety of formats.
- The only caveat in using the data sets is you have to make sure you clean them, since many have missing values and characters.
- Socrata is another good place to explore government-related data. One great thing about Socrata is they have some visualization tools that make exploring the data easier.
- City-specific government data: Some cities have their own data portals setup to browse through city-related data. For example, at San Francisco Data you can browse through everything from crime statistics to parking spot available in the city.
- The UN and UN-related sites like UNICEF and the World Health Organization are rich with all kinds of data, from mortality rates to world hunger statistics.
- The Census Bureau houses a ton of information about our lives around income, race, education, population and business.
These are the places that house data from all kinds of sources. Sometimes it’s easier to find something here related to a specific category.
- Programmable Web: A really useful resource to explore API’s and also mashups of different API’s.
- Infochimps have a data marketplace that offers thousands of public and propietary data sets for download and API access, in a wide range of categories, from historical Twitter and OK Cupid data, to geo locations data, in different formats. You can even upload you own data if you like.
- Data Market is a good place to explore data related to economics, healthcare, food and agriculture, and the automotive industry.
- Google Public data explorer houses a lot of data from world development indicators, OECD and human development indicators, mostly related to economics data and the world.
- Junar is a great data scraping service that also houses data feeds.
- Buzzdata is a social data sharing service that allows you to upload your own data and connect and follow others who are uploading their own data.
Usually, the best place to get social data for an API is the site itself: Instagram, GetGlue, Foursquare, pretty much all social media sites have their own API’s. Here are more details on the most popular ones.
- Twitter: Access to the Twitter API for historical uses is fairly limited, to 3200 tweets. For more, check out PeopleBrowsr, Gnip (also offers historical access to the WP Automattic data feed), DataSift, Infochimps, Topsy.
- Foursquare: They have their own API and you can get it through Infochimps, as well.
- Facebook: The Facebook graph API is the best resource for Facebook.
- Wunderground has detailed weather information and also let’s you search historical data by zip code or city. It gives temperature, wind, precipitation and hourly observations for that day.
- Weatherbase has detailed weather stats on temperature, rain and humidity of nearly 27,000 cities.
These three sites have comprehensive information on teams, players coaches and leaders by season.
- ESPN recently came up with its own API, too. You have to be a partner to get access to their data.
Universities and research
Searching the work of academics who specialize in a particular area is always a great place to find some interesting data.
If you come across specific data that you would like to use, say, in a research paper, the best way to go is to contact the professor or researcher.
UCLA. – One university that makes some of the datasets used in its courses publicly.
The New York Times has a great API and a really good explorer to access any article in the publication. The data is returned in json format.