Table of Contents
Home
Table of Contents
Software Resources
About the Author
Ordering Information
Errata

Series Foreword / Preface / Chapter 1 / Chapter 2 / Chapter 3 / Chapter 4 / Chapter 5 / References

 
Series Foreword
Preface
1 Introduction
1.1 The Learning Problem and Statistical Inference
1.1.1 Supervised Learning
Classification Learning
Preference Learning
Function Learning
1.1.2 Unsupervised Learning
1.1.3 Reinforcement Learning
1.2 Learning Kernel Classifiers
1.3 The Purposes of Learning Theory
 
I Learning Algorithms
 
2 Kernel Classifiers From A Machine Learning Perspective
2.1 The Basic Setting
2.2 Learning by Risk Minimization
2.2.1 The (Primal) Perceptron Algorithm
2.2.2 Regularized Risk Functionals
2.3 Kernels and Linear Classifiers
2.3.1 The Kernel Technique
2.3.2 Kernel Families
Kernels on Inner Product Spaces-Polynomial and RBF Kernels
Kernels on Strings
Kernels from Probabilistic Models of the Data
2.3.3 The Representer Theorem
Reproducing Kernel Hilbert Spaces
2.4 Support Vector Classification Learning
2.4.1 Maximizing the Margin
2.4.2 Soft Margins - Learning with Training Error
Linear Approximation
Quadratic Approximation
2.4.3 Geometrical Viewpoints on Margin Maximization
2.4.4 The -Trick and Other Variants
Multiclass Support Vector Machines
Support Vector Regression Estimation
-Support Vector Machines for Classification
2.5 Adaptive Margin Machines
2.5.1 Assessment of Learning Algorithms
2.5.2 Leave-One-Out Machines
2.5.3 Pitfalls of Minimizing a Leave-One-Out Bound
2.5.4 Adaptive Margin Machines
2.6 Bibliographical Remarks
3 Kernel Classifiers From A Bayesian Perspective
3.1 The Bayesian Framework
The Likelihood
The Prior
Evidence of H
3.1.1 The Power of Conditioning on Data
3.2 Gaussian Processes
3.2.1 Bayesian Linear Regression
3.2.2 From Regression to Classification
3.3 The Relevance Vector Machine
3.4 Bayes Point Machines
3.4.1 Estimating the Bayes Point
3.5 Fisher Discriminants
3.6 Bibliographical Remarks
 
II Learning Theory
 
4 Mathematical Models of Learning
4.1 Generative vs. Discriminative Models
4.2 PAC and VC Frameworks
4.2.1 Classical PAC and VC Analysis
Confidence Intervals
4.2.2 Growth Function and VC Dimension
4.2.3 Structural Risk Minimization
4.3 The Luckiness Framework
4.4 PAC and VC Frameworks for Real-Valued Classifiers
4.4.1 VC Dimensions for Real-Valued Function Classes
4.4.2 The PAC Margin Bound
4.4.3 Robust Margin Bounds
Application to Adaptive Margin Machines
4.5 Bibliographical Remarks
5 Bounds for Specific Algorithms
5.1 The PAC-Bayesian Framework
5.1.1 PAC-Bayesian Bounds for Bayesian Algorithms
A Bound for the MAP Estimator
A Bound for the Gibbs Classification Strategy
The Gibbs-Bayes Lemma
Bounds with Training Errors
5.1.2 A PAC-Bayesian Margin Bound
5.2 Compression Bounds
5.2.1 Compression Schemes and Generalization Error
5.2.2 On-line Learning and Compression Schemes
5.3 Algorithmic Stability Bounds
5.3.1 Algorithmic Stability for Regression
5.3.2 Algorithmic Stability for Classification
5.4 Bibliographical Remarks
 
III Appendices
 
A Theoretical Background and Basic Inequalities
A.1 Notation
A.2 Probability Theory
A.2.1 Some Results for Random Variables
A.2.2 Families of Probability Measures
Probability Measures over the Natural Numbers 
Probability Measures on the Real Line 
Probability Measure on Rn
Exponential Family 
A.3 Functional Analysis and Linear Algebra
A.3.1 Covering, Packing and Entropy Numbers
A.3.2 Matrix Algebra
A.4 Ill-Posed Problems
A.5 Basic Inequalities
A.5.1 General (In)equalities}
A.5.2 Large Deviation Bounds 
Large Deviations of Functions of Random Variables 
B Proofs and Derivations - Part I
B.1 Functions of Kernels
B.2 Efficient Computation of String Kernels
B.2.1 Efficient Computation of the Substring Kernel
B.2.2 Efficient Computation of the Subsequence Kernel
B.3 Representer Theorem
B.4 Convergence of the Perceptron
B.5 Convex Optimization Problems of Support Vector Machines
B.5.1 Hard Margin SVM
B.5.2 Linear Soft Margin Loss SVM
B.5.3 Quadratic Soft Margin Loss SVM
B.5.4 -Linear Margin Loss SVM
B.6 Leave-One-Out Bound for Kernel Classifiers
B.7 Laplace Approximation for Gaussian Processes
B.7.1 Maximization of fTm+1|X=x,Zm=z
B.7.2 Computation of 
B.7.3 Stabilized Gaussian Process Classification
B.8 Relevance Vector Machines
B.8.1 Derivative of the Evidence w.r.t. 
B.8.2 Derivative of the Evidence w.r.t. 2
B.8.3 Update Algorithms for Maximizing the Evidence
B.8.4 Computing the Log-Evidence
B.8.5 Maximization of the fW|Zm=z
B.9 A Derivation of the Operation �
B.10 Fisher Linear Discriminant
C Proofs and Derivations - Part II
C.1 VC and PAC Generalization Error Bounds
C.1.1 Basic Lemmas
C.1.2 Proof of the Basic VC Theorem 
C.2 Bound on the Growth Function
C.3 Luckiness Bound
C.4 Empirical VC Dimension Luckiness
C.5 Bound on the Fat Shattering Dimension
C.6 Margin Distribution Bound
C.7 The Quantifier Reversal Lemma
C.8 A PAC-Bayesian Marin Bound
C.8.1 Balls in Version Space
C.8.2 Volume Ratio Theorem
C.8.3 A Volume Ratio Bound
C.8.4 Bollmann's Lemma
C.9 Algorithmic Stability Bounds
C.9.1 Uniform Stability of Functions Minimizing a Regularized Risk
C.9.2 Algorithmic Stability Bounds
D Pseudocodes
D.1 Perceptron Algorithm
D.2 Support Vector and Adaptive Margin Machines
D.2.1 Standard Support Vector Machines
D.2.2 -Support Vector Machines
D.2.3 Adaptive Margin Machines
D.3 Gaussian Processes
D.4 Relevance Vector Machines
D.5 Fisher Discriminants
D.6 Bayes Point Machines
List of Symbols
References