2013-03-01-SVM

1 Linear Regression
2 Types of Models
- 2.1 Details
3 Case Study
- 3.1 Problem
4 Solution?
- 4.1 Prompts
5 Similarity
- 5.1 Challenge
- 5.2 Log & Square
6 Point Distance
- 6.1 Error
7 Aggregate
- 7.1 Questions
8 Fitness Function
- 8.1 Fitness
9 Understanding Error
- 9.1 Error
10 Solution as Minimization
11 Solution Approximation
- 11.1 Analytical
12 Gradient Descent
- 12.1 Steps
13 General Case
- 13.1 Approximate visualization
14 Support Vector Machines
15 Decision Trees
- 15.1 Linearly Separable
- 15.2 Details
16 Possibilities
- 16.1 Best?
- 16.2 Best Separator
- 16.3 Points, Vectors
17 Dimensions
- 17.1 Vocabulary
18 Expressing the Hyperplane
- 18.1 Questions
19 Challenge
20 Maximizing Fitness Function
21 Kernel Tricks
- 21.1 Polynomial Kernel
  - 21.1.1 Square
- 21.2 Polynomial Kernel
- 21.3 Details
22 Break

1 Linear Regression slide

2 Types of Models slide animate

Classifiers
Regressions
Clustering
Outlier

2.1 Details notes

Classifiers: describes and distinguishes cases. Yelp may want to find a category for a business based on the reviews and business description
Regressions: Predict a continuous value. Eg. predict a home's selling price given sq footage, # of bedrooms
Clustering: find "natural" groups of data without labels
Outlier: find anomalous transactions, eg. finding fraud for credit cards

3 Case Study slide

Housing prices: square footage

img/housing-regression.gif

3.1 Problem notes

We'd like to know how to price a house based on the square footage
Let's pretend this is the data we have
How would we guess that value for 2500 sq ft?

4 Solution? slide animate

Find a line that represents the data
y = m*x + b
A line that is not very far from the points

4.1 Prompts notes

In English, how would you solve this?
How to mathematically represent the line?
What is a good line?

5 Similarity slide

Main challenges in data mining: defining a specific metric for an intuition
Define distance for an individual point
Define how to aggregate distances together

5.1 Challenge notes

This is big problem for engineering and math (stats) in general
We'll cover some concepts, but if you're ever stuck, try looking in related fields
What are some of the ways we can measure distance between points? Euclidian, Manhattan, Euclidian == L₂ norm
What is a way to aggrgate numbers? sum, sum of squares, sum of logs
Differences between the last two?

5.2 Log & Square slide two_col

Log: Useful for de-emphasizing large raw differences
Square: Useful for taking the approximate absolute value

img/logx.gif

6 Point Distance slide two_col

y distance from line
Intuitively: error in estimate
h(x) = m*x + b
err = h(x) - y

img/error.gif

6.1 Error notes

We want the difference from what we estimate to be the value to what the value actually is

7 Aggregate slide animate

sum
What about negative error?
Sum of squares
err = sum( (h(x) - y)**2 for x,y in dataset) / len(dataset)

7.1 Questions notes

Now we have info about all the errors from points, how to summarize?
Some points have negative error, some positive? Do they cancel each other out?
Imagine data set of two points: one solutions covers lines, other divides them. Which is better?
Use our squaring trick to make sure we don't have any negative values
Normalize by the number of points

8 Fitness Function slide

Measures the quality or cost of the solution
Key ingredient for data mining algorithms
If you can measure it, you can find the best solution

8.1 Fitness notes

Function spits out a metric. Metric can be thought of as fitness or cost
Find the maximum or minimum of that metric
Depending on your fitness function, this can be easy or difficult
img: http://onlinestatbook.com

9 Understanding Error slide

img/Linear_regression.svg.png Several possible solutions

9.1 Error notes

What happens to the error as we move line around?
Decreases until best fit, then increases
What happens if we plot this error? Say, slope (x) against error (y)?

10 Solution as Minimization slide two_col

Error is a parabola
Several methods for finding the minimum
Two categories: analytical, approximations

img/parabola.png

11 Solution Approximation slide

Some fitness functions can be difficult to solve analytically
Alternative: iteratively get closer to the solution
Stop when answer is close enough

11.1 Analytical notes

How to find the minimum of functions in general?
Take derivative, find 0
Taking derivative can be complex or impossible (discontinuities) for some functions, or solving for 0 is difficult
Instead, well keep getting closer to the minimum using the function we already have

12 Gradient Descent slide two_col

Estimate current gradient (derivative)
Take a step (a * deriv) in the direction of the gradient
Step size is small, stop. Else repeat.

img/parabola.png

12.1 Steps notes

Take gradient by looking at the local derivative, or perturbating x
Choose a as step size weight: big a is large step size
If deriv is large, will also make you step size large.
If deriv is large, probably means you are far away from minimum
Keep repeating
What happens if a is too small?
What happens if a is too big?

13 General Case slide two_col

Formulate fitness function for your problem
Use analytics or approximations to find min/max
Approximations: Newton's Method, Gradient Descent

img/error-reduce.png

13.1 Approximate visualization notes

Desired output of the error as gradient descent runs
maybe some local problems, as step size is too big, but slowly move down to a small amount of error

14 Support Vector Machines slide

15 Decision Trees slide two_col

Great for separable attributes
Rules operate on independent attributes
Classes separable along an axis/attribute

img/tree.png

15.1 Linearly Separable slide

How to handle case where separator line is not along an axis?

img/dataset_linsep.png

15.2 Details notes

Could say if x>2 and y>2, but not a great intuitive fit
Draw a line that takes both into account
y = m*x + b
img: http://www.eric-kim.net/eric-kim-net/posts/1/kernel_trick.html

16 Possibilities slide

Many lines could separate these classes

img/dataset_linsep.png

16.1 Best? notes

Which is the best?
Why?

16.2 Best Separator slide two_col

Best line gives the most distance between the two classes
Measure distance between closest points
Closest points == support vectors

img/separable.jpg

16.3 Points, Vectors notes

Points can be represented as vectors
Vector math can be easier to express succinctly
img: http://www.sciencedirect.com/science/article/pii/S1072751511001918

17 Dimensions slide

When separating two dimensions, we need a line
When separating 3 dimensions?
4 dimensions?

17.1 Vocabulary notes

Plane
Hyperplane

18 Expressing the Hyperplane slide animate

y = m*x + b
x_2 = m*x_1 + b
0 = m*x_1 + b - x_2
0 = [m -1] * [x_1, x_2] + b
0 = w * x + b

18.1 Questions notes

How do you mathematically represent a line?
Now, we're not going to think of a new letter for every dimension, we're just going to say x₁ , x₂ , x₃ …
Rewrite mathematically
How to add more dimensions? x₂₂? Express x as a vector of all attributes
Again, don't want to come up with a bunch more letters after m, so use w as the matrix representing all the m slopes

19 Challenge slide two_col

Find w, b such that w * x + b maximizes the distance between the support vectors

img/svm.png

20 Maximizing Fitness Function slide two_col

Now we have a fitness function and parameters we're trying to optimize
Sound familiar?