Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between inferential analysis and predictive analysis?

Objective

To clarify by having what traits or attributes, I can say an analysis is inferential or predictive.

Background

Taking a data science course which touches on analyses of Inferential and Predictive. The explanations (what I understood) are

  • Inferential

    Induct a hypothesis from a small samples in a population, and see it is true in larger/entire population.

    It seems to me it is generalisation. I think induct smoking causes lung cancer or CO2 causes global warming are inferential analyses.

  • Predictive

    Induct a statement of what can happen by measuring variables of an object.

    I think, identify what traits, behaviour, remarks people react favourably and make a presidential candidate popular enough to be the president is a predictive analysis (this is touched in the course as well).

Question

I am bit confused with the two as it looks to me there is a grey area or overlap.

Bayesian Inference is "inference" but I think it is used for prediction such as in a spam filter or fraudulent financial transaction identification. For instance, a bank may use previous observations on variables (such as IP address, originator country, beneficiary account type, etc) and predict if a transaction is fraudulent.

I suppose the theory of relativity is an inferential analysis that inducted a theory/hypothesis from observations and thought experimentations, but it also predicted light direction would be bent.

Kindly help me to understand what are Must Have attributes to categorise an analysis as inferential or predictive.

like image 919
mon Avatar asked Dec 26 '15 08:12

mon


1 Answers

"What is the question?" by Jeffery T. Leek, Roger D. Peng has a nice description of the various types of analysis that go into a typical data science workflow. To address your question specifically:

An inferential data analysis quantifies whether an observed pattern will likely hold beyond the data set in hand. This is the most common statistical analysis in the formal scientific literature. An example is a study of whether air pollution correlates with life expectancy at the state level in the United States (9). In nonrandomized experiments, it is usually only possible to determine the existence of a relationship between two measurements, but not the underlying mechanism or the reason for it.

Going beyond an inferential data analysis, which quantifies the relationships at population scale, a predictive data analysis uses a subset of measurements (the features) to predict another measurement (the outcome) on a single person or unit. Web sites like FiveThirtyEight.com use polling data to predict how people will vote in an election. Predictive data analyses only show that you can predict one measurement from another; they do not necessarily explain why that choice of prediction works.

data analysis flowchart

like image 117
dranxo Avatar answered Sep 29 '22 14:09

dranxo