Fundamental Models in Data Science

1. Classification (probability estimation or scoring): binary or categorical.Attempt to predict, for each individual in the population, which of a (small) set of classes this individual belongs to. Classification will bucket individuals, and scoring will provide quantification of likelihood of being in a particular bucket.

2. Regression (“value estimation”): numeric. attempts to estimate or predict, for each individual, the numerical value of some variable for that individual.

3. Similarity Matching: attempts to identify similar individuals based on data known about them.

4. Clustering: attempts to group individuals in a population together by their similarity, but not driven by any specific purpose.

5. Co-occurrence (also known as – frequent item mining, association rule discovery, and market-basket analysis): attempts to find associations between entities based on transactions involving them.

6. Profiling (also known as behavior description): attempts to characterize the typical behavior of an individual, group, or population.

7. Link prediction: attempts to predict connections between data items, usually suggesting that a link should exist, and possibly also estimating the strength of the link.

8. Data reduction: attempts to take a large set of data and replace it with a smaller set of data that contains much of the important information in the larger set.

9. Casual modeling: attempts to help us understand what events or actions actually influence others.