What is Interpolation?

Personally, for me to understand difficult statistics procedures I find it easier to see concrete examples, things I can understand and on a rare occasion sometimes I’ll even study ugly math. So, I’m not in the mood for ugly math today, So let’s use M & M’s
The colors around these M&Ms are the interpolated colors the source.
Or maybe we could learn from some crime data from San Francisco.
The value of the data outside has been estimated with color and contour perimeters using interpolation.
Scientists use this technique because there is never enough time or money to measure every point in the area of interest.
Interpolation is based on:
Tobler’s Law of Geography, which states that everything is related to everything else, but near things are more related than distant things.
Points closer together in space are more likely to have similar values than points that are farther apart.
This is called spatial autocorrelation, by the way I Googled most of this.
Interpolation is used in many fields, from photography to geology.
There are many different computer algorithms used to interpolate data points.
Scientists choose between different algorithms based on the type of data and how the data will be used.
Here’s an example of how it’s done with software.
So Interpolation is a method for estimating the value of a function between two known data points or values.
For example, suppose we have data for temperatures and corresponding air velocity from 0 to 2000 in 200 steps. Interpolation can be used to estimate the temperature for non re-corded values such as air velocity of 250 .
First you need to know there’s different types of interpolation and here’s just 2 to get started.
Linear interpolation involves estimating a new value by connecting two adjacent known values points with a straight line.
Spline or Cubic interpolation is an approach that with a sequence of value points, and a curve is constructed whose shape closely follows this sequence.
In Graph Builder, we can see that with these data points plotted with a tight Spline allow us to visualize and forecast what the in between data points would be.
If I turn on the cross hairs I can hover over and interpolate values on the fly.
To better analyze this and attain some prediction values we’ll use the Fit Y by X. Which is a comparison of 2 variables. I’ll choose Air Velocity as the Predictor or Y , and Thermal as the X.
Here are plotted data points. Using the red triangle I’ll add a Flexible Line, under Fit Spline choose then I’ll choose 1.
Now for the prediction of data points, we don’t have.
Under the red triangle of the Fit Line. I’ll save a prediction column back to our data table.
Save Predicteds, I didn’t even now that word existed.
Now when I type in a value for Air Velocity I’ll have my interpolated or predicted value for Temperature.
So that’s Interpolation.
If you like this video, punch the thumbs up, and subscribe so I’ll know to more of these videos.
And remember your homework; throw 5 M&Ms in a low clear dish with water in it. Watch the colors merge.
When you are done, do it again, empty the dish and this time uses several different arrangements of color and position.

 

Videos

bootstrap 201000
correlation 135000
histogram 90500
confidence interval 49500
z score 49500
barchart 40500
box plot 33100
cluster 33100
neural network 27100
chi square test 27100
scatter plot 27100
monte carlo simulation 18100
principal component analysis 14800
random sampling 12100
hypothesis testing 12100
factor analysis 9900
paired t test 9900
one way anova 6600
sample size 5400
two sample t test 5400
two way anova 4400
one sample t test 4400
manova 4400
multiple linear regression 2900
run chart 2900
simple linear regression 2400
non parametric test 1900
stepwise regression 1900
arima model 1900
discriminant analysis 1900
gauge r&r 1600
factorial anova 1300
market basket analysis 1300
regression tree 1300
mosaic plot 720
classification tree 720
multiple logistic regression 590
fractional factorial design 480
full factorial design 320
process capability analysis 260
association analysis 260
randomization test 210
non parametric correlation 210
repeated measures analysis 170
accelerated life testing 170
mixed model analysis 110
model comparison 110
attribute control chart 110
simple logistic regression 110
fit distribution 90
distribution fitting 90
assessing normality 70
variable control chart 70
finding the area under a normal curve 20
pareto plots 10
Finding standardized values
Time series smoothing models
Capability analysis for multiple responses
Msa continuous data
Msa attribute data
Full factorial analysis
Fractional factorial analysis
Screening experiment analysis
TOTAL 847200

Videos

bootstrap 201000
correlation 135000
histogram 90500
confidence interval 49500
z score 49500
barchart 40500
box plot 33100
cluster 33100
neural network 27100
chi square test 27100
scatter plot 27100
monte carlo simulation 18100
principal component analysis 14800
random sampling 12100
hypothesis testing 12100
factor analysis 9900
paired t test 9900
one way anova 6600
sample size 5400
two sample t test 5400
two way anova 4400
one sample t test 4400
manova 4400
multiple linear regression 2900
run chart 2900
simple linear regression 2400
non parametric test 1900
stepwise regression 1900
arima model 1900
discriminant analysis 1900
gauge r&r 1600
factorial anova 1300
market basket analysis 1300
regression tree 1300
mosaic plot 720
classification tree 720
multiple logistic regression 590
fractional factorial design 480
full factorial design 320
process capability analysis 260
association analysis 260
randomization test 210
non parametric correlation 210
repeated measures analysis 170
accelerated life testing 170
mixed model analysis 110
model comparison 110
attribute control chart 110
simple logistic regression 110
fit distribution 90
distribution fitting 90
assessing normality 70
variable control chart 70
finding the area under a normal curve 20
pareto plots 10
Finding standardized values
Time series smoothing models
Capability analysis for multiple responses
Msa continuous data
Msa attribute data
Full factorial analysis
Fractional factorial analysis
Screening experiment analysis
TOTAL 847200

Free Data

The best way to learn the craft of predictive analytics or even just preparing data is to find information you’re interested in and tell a story. And finding an interesting data set to share a story can be the most difficult part of creating engaging data visualization. Hopefully this list will help you with finding that interesting cool data.

Effective Clear Data visualization is a multi-step process. 

1st you have to find reliable data, then you need to clean it,  possibly join multiple tables that have conflicting date formatting, get it into the right format, and then uncover the story you will visualize.

I just joined this, pretty valuable.  https://data.world/about/
Discover and share cool data, connect with interesting people, and work together to solve problems faster.

Yep. Amazon has some free data.  AWS

AWS hosts a variety of public data sets that anyone can access for free.

Previously, large data sets such as the mapping of the Human Genome required hours or days to locate, download, customize, and analyze. Now, anyone can access these data sets via the AWS centralized data repository large dataset.

Go There

Health-related data 

Machine Data

Government and political data

  • Data.gov: This is the  go-to resource for government-related data. It claims to have up to 400,000 data sets, both raw data and geo spatial, in a variety of formats.
  • The only caveat in using the data sets is you have to make sure you clean them, since many have missing values and characters.
  • Socrata is another good place to explore government-related data. One great thing about Socrata is they have some visualization tools that make exploring the data easier.
  • City-specific government data: Some cities have their own data portals setup to browse through city-related data. For example, at San Francisco Data you can browse through everything from crime statistics to parking spot available in the city.
  • The UN and UN-related sites like UNICEF and the World Health Organization are rich with all kinds of data, from mortality rates to world hunger statistics.
  • The Census Bureau houses a ton of information about our lives around income, race, education, population and business.

Data aggregators

These are the places that house data from all kinds of sources. Sometimes it’s easier to find something here related to a specific category.

  • Programmable Web: A really useful resource to explore API’s and also mashups of different API’s.
  • Infochimps have a data marketplace that offers thousands of public and propietary data sets for download and API access, in a wide range of categories, from historical Twitter and OK Cupid data, to geo locations data, in different formats. You can even upload you own data if you like.
  • Data Market is a good place to explore data related to economics, healthcare, food and agriculture, and the automotive industry.
  • Google Public data explorer houses a lot of data from world development indicators, OECD and human development indicators, mostly related to economics data and the world.
  • Junar is a great data scraping service that also houses data feeds.
  • Buzzdata is a social data sharing service that allows you to upload your own data and connect and follow others who are uploading their own data.

Social data

Usually, the best place to get social data for an API is the site itself: InstagramGetGlue, Foursquare, pretty much all social media sites have their own API’s. Here are more details on the most popular ones.

  • Instagram
  • GetGlue
  • Twitter: Access to the Twitter API for historical uses is fairly limited, to 3200 tweets. For more, check out PeopleBrowsr,  Gnip (also offers historical access to the WP Automattic data feed), DataSiftInfochimpsTopsy.
  • Foursquare: They have their own API and you can get it through Infochimps, as well.
  • FacebookThe Facebook graph API is the best resource for Facebook.

Weather data

  • Wunderground has detailed weather information and also let’s you search historical data by zip code or city. It gives temperature, wind, precipitation and hourly observations for that day.
  • Weatherbase has detailed weather stats on temperature, rain and humidity of nearly 27,000 cities.

Sports data

These three sites have comprehensive information on teams, players coaches and leaders by season.

Universities and research

Searching the work of academics who specialize in a particular area is always a great place to find some interesting data.

If you come across specific data that you would like to use, say, in a research paper, the best way to go is to contact the professor or researcher. 

 UCLA. – One university that makes some of the datasets used in its courses publicly.

News data

The New York Times has a great API and a really good explorer to access any article in the publication. The data is returned in json format.