Series Foreword Preface 1 Introduction
1.1 The Learning Problem and Statistical Inference 1.1.1 Supervised Learning Classification Learning Preference Learning Function Learning 1.1.2 Unsupervised Learning 1.1.3 Reinforcement Learning 1.2 Learning Kernel Classifiers 1.3 The Purposes of Learning Theory
2 Kernel Classifiers From A Machine Learning Perspective
2.1 The Basic Setting 2.2 Learning by Risk Minimization 2.2.1 The (Primal) Perceptron Algorithm 2.2.2 Regularized Risk Functionals 2.3 Kernels and Linear Classifiers 2.3.1 The Kernel Technique 2.3.2 Kernel Families Kernels on Inner Product Spaces-Polynomial and RBF Kernels Kernels on Strings Kernels from Probabilistic Models of the Data 2.3.3 The Representer Theorem Reproducing Kernel Hilbert Spaces 2.4 Support Vector Classification Learning 2.4.1 Maximizing the Margin 2.4.2 Soft Margins - Learning with Training Error Linear Approximation Quadratic Approximation 2.4.3 Geometrical Viewpoints on Margin Maximization 2.4.4 The ν-Trick and Other Variants Multiclass Support Vector Machines Support Vector Regression Estimation ν-Support Vector Machines for Classification 2.5 Adaptive Margin Machines 2.5.1 Assessment of Learning Algorithms 2.5.2 Leave-One-Out Machines 2.5.3 Pitfalls of Minimizing a Leave-One-Out Bound 2.5.4 Adaptive Margin Machines 2.6 Bibliographical Remarks
3 Kernel Classifiers From A Bayesian Perspective
3.1 The Bayesian Framework The Likelihood The Prior Evidence of H 3.1.1 The Power of Conditioning on Data 3.2 Gaussian Processes 3.2.1 Bayesian Linear Regression 3.2.2 From Regression to Classification 3.3 The Relevance Vector Machine 3.4 Bayes Point Machines 3.4.1 Estimating the Bayes Point 3.5 Fisher Discriminants 3.6 Bibliographical Remarks
4 Mathematical Models of Learning
4.1 Generative vs. Discriminative Models 4.2 PAC and VC Frameworks 4.2.1 Classical PAC and VC Analysis Confidence Intervals 4.2.2 Growth Function and VC Dimension 4.2.3 Structural Risk Minimization 4.3 The Luckiness Framework 4.4 PAC and VC Frameworks for Real-Valued Classifiers 4.4.1 VC Dimensions for Real-Valued Function Classes 4.4.2 The PAC Margin Bound 4.4.3 Robust Margin Bounds Application to Adaptive Margin Machines 4.5 Bibliographical Remarks
5 Bounds for Specific Algorithms
5.1 The PAC-Bayesian Framework 5.1.1 PAC-Bayesian Bounds for Bayesian Algorithms A Bound for the MAP Estimator A Bound for the Gibbs Classification Strategy The Gibbs-Bayes Lemma Bounds with Training Errors 5.1.2 A PAC-Bayesian Margin Bound 5.2 Compression Bounds 5.2.1 Compression Schemes and Generalization Error 5.2.2 On-line Learning and Compression Schemes 5.3 Algorithmic Stability Bounds 5.3.1 Algorithmic Stability for Regression 5.3.2 Algorithmic Stability for Classification 5.4 Bibliographical Remarks
A Theoretical Background and Basic Inequalities
A.1 Notation A.2 Probability Theory A.2.1 Some Results for Random Variables A.2.2 Families of Probability Measures Probability Measures over the Natural Numbers Probability Measures on the Real Line Probability Measure on Rn Exponential Family A.3 Functional Analysis and Linear Algebra A.3.1 Covering, Packing and Entropy Numbers A.3.2 Matrix Algebra A.4 Ill-Posed Problems A.5 Basic Inequalities A.5.1 General (In)equalities} A.5.2 Large Deviation Bounds Large Deviations of Functions of Random Variables
B Proofs and Derivations - Part I
B.1 Functions of Kernels B.2 Efficient Computation of String Kernels B.2.1 Efficient Computation of the Substring Kernel B.2.2 Efficient Computation of the Subsequence Kernel B.3 Representer Theorem B.4 Convergence of the Perceptron B.5 Convex Optimization Problems of Support Vector Machines B.5.1 Hard Margin SVM B.5.2 Linear Soft Margin Loss SVM B.5.3 Quadratic Soft Margin Loss SVM B.5.4 ν-Linear Margin Loss SVM B.6 Leave-One-Out Bound for Kernel Classifiers B.7 Laplace Approximation for Gaussian Processes B.7.1 Maximization of fTm+1|X=x,Zm=z B.7.2 Computation of Σ B.7.3 Stabilized Gaussian Process Classification B.8 Relevance Vector Machines B.8.1 Derivative of the Evidence w.r.t. θ B.8.2 Derivative of the Evidence w.r.t. σ2 B.8.3 Update Algorithms for Maximizing the Evidence B.8.4 Computing the Log-Evidence B.8.5 Maximization of the fW|Zm=z B.9 A Derivation of the Operation ¤ B.10 Fisher Linear Discriminant
C Proofs and Derivations - Part II
C.1 VC and PAC Generalization Error Bounds C.1.1 Basic Lemmas C.1.2 Proof of the Basic VC Theorem C.2 Bound on the Growth Function C.3 Luckiness Bound C.4 Empirical VC Dimension Luckiness C.5 Bound on the Fat Shattering Dimension C.6 Margin Distribution Bound C.7 The Quantifier Reversal Lemma C.8 A PAC-Bayesian Marin Bound C.8.1 Balls in Version Space C.8.2 Volume Ratio Theorem C.8.3 A Volume Ratio Bound C.8.4 Bollmann's Lemma C.9 Algorithmic Stability Bounds C.9.1 Uniform Stability of Functions Minimizing a Regularized Risk C.9.2 Algorithmic Stability Bounds
D Pseudocodes
D.1 Perceptron Algorithm D.2 Support Vector and Adaptive Margin Machines D.2.1 Standard Support Vector Machines D.2.2 ν-Support Vector Machines D.2.3 Adaptive Margin Machines D.3 Gaussian Processes D.4 Relevance Vector Machines D.5 Fisher Discriminants D.6 Bayes Point Machines
List of Symbols References |