1 Classification: Bayes    slide

2 Confusion Matrix    slide

  • What are the ways that classification can be wrong?
Predict: PositivePredict: Negative
Actual: PositiveTrue PositiveFalse Negative
Actual: NegativeFalse NegativeTrue Negative

2.1 Obtain Data    notes

  • How do we obtain this data?

3 Testing Data    slide two_col

  • Data used to test a learned model
  • Test data was not used to learn
  • Where does test data come from?


4 Training Data    slide

  • Set aside a portion of training data to test with
  • Test data:


4.1 Set Aside Testing    slide


Testing Data | Training Data

4.2 Colors    notes

  • Red: Testing
  • Green: Training

4.3 Cross Validation    slide


Train and test model with different subsets of data

4.4 Testing the model    notes

  • This is used to test the model
  • How well does it perform with a variety of inputs?
  • Is it robust against outliers

4.5 K-Fold Validation    slide


Test against K sections of the data

4.6 Statistical Significance    notes

  • Similar to the concept in stats: the more distinct samples you have, the better you know your data

4.7 K-Fold Validation    slide


5 Bayes Theorem    slide


Can calculate a posterior given priors

5.1 Read    notes

  • Probability of A given B equals probability of B given A times prob of A divided prob of B
  • Importance is that we can figure out what future probabilities are based on what we've already seen

6 Spam    slide


Find the probability of spam given it contains a particular word

6.1 Words    notes

  • What words would you associate with spam?
  • Are these the same across all people?
  • Why might you want to train a classifier per person?

7 Multiple Words    slide animate

  • How to calculate probabilities of multiple independent events occurring?
  • Model words as independent events
  • Multiply probabilities

7.1 Naive    notes

  • Words are not independent
  • San? Francisco is more likely
  • But works suprisingly well in practice

8 Practical concerns    slide animate

  • What is the probability of a word we've never seen before?
  • Underflow: multiplying numbers still everything is rounded to 0
  • Normalizing words: v1agra

8.1 Solutions    notes

  • divide by 0. Instead, add 1 to all words
  • using log of probabilities
  • Rules

9 Ensemble    slide

  • Using multiple models simultaneously
  • Run all classifiers over new data, take majority vote
  • Netflix Prize won with combination of models from several teams

9.1 Requirements    notes

  • Nice thing is that the diversity of models is important, and not so much the accuracy of any single model

10 Bootstrap Aggregating    slide two_col

  • Bagging: training data collected with replacement
  • Learn models on different samples
  • Run models on new incoming data


10.1 Trade-offs    notes

11 Boosting    slide

  • Train classifier to catch what the last one missed
  • Train and test first classifier
  • Find classification failures
  • Weight more heavily those failures in training a new model
  • Weight models by their accuracy

11.1 Trade-offs    notes

  • Boosting can be susceptible to outliers
  • Longer to train
  • Observed to be more accurate

12 Many Decision Trees    slide

  • Train trees with random selection of attributes, subset of data
  • Combine trees using majority or weights
  • What to call many arbitrarily picked trees?

12.1 Random Forests    slide two_col


  • Used successfully in many recent competitions
  • Carry over robustness properties from individual decision trees
  • Can be trained in parallel

12.2 Parallel    notes

  • Potentially good fit for MapReduce paradigms

