Errata
Home
Table of Contents
Software Resources
About the Author
Ordering Information
Errata

First print (postscript,pdf)

bulletp. xvii, line -3: "Since the book uses a very rigorous notation systems, ..." should read "Since the book uses a very rigorous notation system, ..." (thanks to Neil Lawrence).
bulletp. 8, line 5: "... reader should refer Sutton and Barto (1998) ..." should read "... reader is referred to Sutton and Barto (1998) ..." (thanks to Thore Graepel).
bulletp. 19, line -12: "... (see Definition A.39)." should read "... (see Definition A.34)." (thanks to Marco Krger).
bulletp. 33, line 15: "x K" should read "x X" (thanks to Arthur Gretton).
bulletp. 33, line -3: "... where U=(u'1;...;u'r) is ..." should read "... where U=(u1,...,un)=(v'1;...;v'r) is ...". Accordingly, in page 34, line 2 and 4 each u is replaced by a v. Furthermore, the subscript K in line on page 34 is omitted (thanks to Malte Kuss).
bulletp. 34, line 5: "... and a mapping L into it ..." should read "... and a mapping f into it ..." (thanks to Petra Philips).
bulletp. 34, line 9: "the nth mapped object xn is" should read "the point i Ui,nf(xi)=LU'un is". Accordingly, the following line changes to "||LU'un||2=un'ULU'un=en'Len=ln<0" (thanks to Malte Kuss).
bulletp. 42, equation (2.26): The second case term should read "kr(u,v) + j=1|v| l2kr'(u1u,v[j:|v|])" (thanks to Michael Davy).
bulletp. 43, equation (2.30): The term l|v|-j should read l|v|-t (thanks to Vikas Sindhwani).
bulletp. 63, line -11: "... is tighter for less sparse solutions." should read "... is tighter for more sparse solutions." (thanks to Diego Andres Alvarez Marin).
bulletp. 77, Example 3.5: "PX=Binomial(n,p)" should read "PX|P=p=Binomial(n,p)" (thanks to Jaz Kandola).
bulletp. 84, line -5: "C(x,x) = x,x + t2Ixx=C(x,x) = k(x,x) + t2Ixx" should read "C(x,x) = x,x + t2Ix=x=C(x,x) = k(x,x) + t2Ix=x" (thanks to Arthur Gretton).
bulletp. 86, line 17: "... on the found local maximum." should read "... on the local maximum found." (thanks to Thore Graepel).
bulletp. 86, line 20: "...is of the ability of ..." should read "...is the ability of ..." (thanks to Thore Graepel).
bulletp. 87, Figure 3.2: "... of the 6 observations ..." should read "... of the 7 observations ...". Furthermore, "This local maxima ..." should read "This local maximum ..." (thanks to Thore Graepel).
bulletp. 87, line -12: " R+" should read " (R+)N".
bulletp. 92, Figure 3.5: "... this liklihood is not normalizable." should read "... this likelihood is not normalizable." (thanks to Neil Lawrence).
bulletp. 99, line -7: "... out be the single weight vector ..." should read "... out by the single weight vector ..." (thanks to Matthias Heiler).
bulletp. 101, line 5: "... we first run learning algorithm to find ..." should read "... we first run a learning algorithm to find ..." (thanks to Matthias Heiler).
bulletp. 110, line -11: Closing parenthesis at MacKay (1998) missing.
bulletp. 110, line -6: Closing parenthesis at Barber and Williams (1997) missing.
bulletp. 122, equation (4.7): The subscript should read "Remp[h,z] =0" instead of "Remp[h] =0" (thanks to Petra Philips).
bulletp. 123, line 13: "... real-valued loss functions conceptually similar ..." should read "... real-valued loss functions is conceptually similar ..." (thanks to Petra Philips).
bulletp. 123, line -8: "... for all (0,1], and all training samples sizes m, with probability at least 1- over the random draw of the training sample z Zm we have ..." should read "... for the zero-one loss l0-1 given by equation (2.10) and all > 0 we have ..." as in Theorem 4.7 (thanks to Simon Hill).
bulletp. 124, line 10: "... (see equation (4.7)) ..." should read "... (see equation (4.8)) ..." (thanks to Jrgen Schweiger).
bulletp. 125, line 5: "It we denote the maximum number ..." should read "If we denote the maximum number ..." (thanks to Petra Philips).
bulletp. 126, enumeration 1: "If the function NH fulfills NH(m)=2m ..." should read "If the function NH fulfills NH(2m)=22m ..." (thanks to Petra Philips).
bulletp. 127, line 2: "The best constants that can be achieved are 2 as a coefficient of and 1 in the exponent of the exponential term, respectively." should read "The best constants that can be achieved for the coefficients of the exponent in the exponential term are 2 and 1, respectively." (thanks to Jaz Kandola).
bulletp. 131, line 3: "Clearly, all functions in h are monotonically ..." should read "Clearly, all functions in H are monotonically ..." (thanks to Petra Philips).
bulletp. 135, line -1: In the statement, "R[h]" should read "R[h] - Remp[h,z] " (thanks to Petra Philips).
bulletp. 137, line 12: The function is typed :N x R x [0,1]  rather than :R x [0,1] because it depends on the training sample size m. Hence, in the next line, (L((Z1,...,Zm),h),) should be replaced by (m,L((Z1,...,Zm),h),).This change also appears in line 18, -6 and -3 on page 137, line 2 on page 138, line 17 on page 139 and line 7 on page 140 (thanks to Petra Philips!).
bulletp. 140, line 5: "... in Appendix C.4 that L(z,h) = -eff(z) ..." should read "... in Appendix C.4 that L(z,h) = -vH(z) ..." (thanks to Petra Philips).
bulletp. 175, Figure 5.1: "... with (solid line) and without (dashed line) ..." should read "... with (dashed line) and without (solid line) ..." (thanks to Dongwei Cao).
bulletp. 182, Definition 5.20: The update function U maps from Y x X x H -> H rather than Y x X x  Y -> H as stated in the text. This changes the definition of an online learning algorithm. As a consequence, the sentence preceding the displayed equation ("... and the prediction of the current hypothesis hj H ...") changes to "... and the current hypothesis hj H ..." (thanks to Thore Graepel!).
bulletp. 183, first equation: Accordingly, "U(x,y,h(x))=h" changes to "U(x,y,h)=h" (thanks to Thore Graepel).
bulletp. 183, line 15: Similar to p. 177, line -5, "(compression function Ci)" should read "(compression function C|i|)".
bulletp. 184, line-8: The fourth point should read "If all training examples are correctly classified, it outputs C and classifies according to (5.18).". Furthermore, the last two sentences of this example are wrong and should be deleted (thanks to Thore Graepel).
bulletp. 187, equation (5.19): This is a definition (similar to (2.14) at  page 29) and not an equation.
bulletp. 194, line 19: "preceeding Shawe-Taylor and Williamson (1997, p. 4)" should read "preceding Shawe-Taylor and Williamson (1997)".
bulletp. 210, line -5: "... we also write Y~Normal(,)." should read "... we also write X~Normal(,)." (thanks to Jian Huang).
bulletp. 211, line -8: "... is a Gaussian measures, ..." should read "... are Gaussian measures, ..." (thanks to Petra Philips).
bulletp. 217, line 5: In the definition of np the second case should read maxi=1,...,n |xi| < rather than maxi=1,...,n |xi| (thanks to Arthur Gretton).
bulletp. 222, line 6: "... is the smallest number > 0 such that ..." should read "... is the smallest number 0 such that ..." (thanks to Petra Philips).
bulletp. 240, line -5: "a0" should read "a0" because otherwise exponentiation of 1+a/x with x invalidates the inequality (thanks to Peter Bollmann-Sdorra).
bulletp. 257, line 6: The braces must not include the summation over t (thanks to Vikas Sindhwani).
bulletp. 282, line 3: In the statement on the lhs. P(A) should read PZ(A) (thanks to Petra Philips).
bulletp. 283, Lemma C.2: There is a major flaw in the proof of this lemma. At the bottom of this page, it is argued that Theorem A.116 proves that the probability that a binomially distributed variable with mean of at least exceeds a value of m/2 is at least 1/2. This is wrong. In order to prove this statement we need a different theorem which is given in the postscript version/pdf version of the errata (thanks to Ulrich Kockelkorn, Mingrui Wu and Vu Ha for pointing this out mistake and helping with the additional proof). Note the Mingrui Wu has also provided an alternative proof/alternative proof which makes the least assumptions.
bulletp. 283, line 13 "... (z) z(A(z)) = 0 if such a sets exists ..." should read "... z(A(z)) = 0 if such a sets exists ..." (thanks to Petra Philips).
bulletp. 285, line -8: "... given in equation (22) and ..." should read "... given in equation (2.11) and ..." (thanks to Petra Philips).
bulletp. 302, Section C.8: The section heading should read "A PAC-Bayesian Margin Bound" rather than "A PAC-Bayesian Marin Bound" (thanks to Thore Graepel).
bulletp. 303, line 4: In the numerator, the integration is up to p rather than 2p (thanks to John Shawe-Taylor).
bulletp. 340: The entry Bartlett and Shawe-Taylor (1998) has the wrong year and is identical to Bartlett and Shawe-Taylor (1999). 
bulletp. 340: In the entry Bennett (1998) there is one extra "19" in the paper title (thanks to Jaz Kandola).
bulletp. 347: In the entry Lauritzen (1981) "t. n. thiele" should read "T. N. Thiele" (thanks to Jaz Kandola).
bulletp. 349: In the entry Neal (1997b) "Technical Report" is spelt twice (thanks to Jaz Kandola).
bulletp. 350: In the entry Robert (1994) "Ney York" should read "New York" (thanks to Jaz Kandola).
bulletp. 354: The entry Watkins (1998) contains (almost) the same content as Watkins (2000) and will be eliminated in future editions.
bulletp. 355: In the entry Williams and Seeger (2001) "nystrom" should read "Nystrm" (thanks to Jaz Kandola).

Second print (postscript,pdf)

bulletp. 19, line -12: "... (see Definition A.39)." should read "... (see Definition A.34)." (thanks to Marco Krger).
bulletp. 42, equation (2.26): The second case term should read "kr(u,v) + j=1|v| l2kr'(u1u,v[j:|v|])" (thanks to Michael Davy).
bulletp. 63, line -5: "... is tighter for less sparse solutions." should read "... is tighter for more sparse solutions." (thanks to Diego Andres Alvarez Marin).
bulletp. 210, line -5: "... we also write Y~Normal(,)." should read "... we also write X~Normal(,)." (thanks to Jian Huang).