 p. xvii, line 3: "Since the book uses a very rigorous notation
systems, ..." should read "Since the book uses a very rigorous notation
system, ..." (thanks to Neil Lawrence). 
 p. 8, line 5: "... reader should refer Sutton and Barto (1998) ..."
should read "... reader is referred to Sutton and Barto (1998) ..."
(thanks to Thore Graepel). 
 p. 19, line 12: "... (see Definition A.39)." should read "... (see
Definition A.34)." (thanks to Marco Krüger). 
 p. 33, line 15: "x K"
should read "x X"
(thanks to Arthur Gretton). 
 p. 33, line 3: "... where U=(u'_{1};...;u'_{r})
is ..." should read "... where U=(u_{1},...,u_{n})=(v'_{1};...;v'_{r})
is ...". Accordingly, in page 34, line 2 and 4 each u is replaced
by a v. Furthermore, the subscript K in line on page 34 is
omitted (thanks to Malte Kuss). 
 p. 34, line 5: "... and a mapping L
into it ..." should read "...
and a mapping f
into it ..." (thanks to Petra
Philips). 
 p. 34, line 9: "the nth mapped object
x_{n} is" should read "the point
_{i }U_{i,n}f(x_{i})=L^{½}U'u_{n
}is". Accordingly, the following line changes to "L^{½}U'u_{n}^{2}=u_{n}'ULU'u_{n}=e_{n}'Le_{n}=l_{n}<0"
(thanks to Malte Kuss). 
 p. 42, equation (2.26): The second case term should read "k_{r}(u,v)
+ _{j=1}^{v} l^{2}·k_{r}'(u1u,v[j:v])"
(thanks to Michael Davy). 
 p. 43, equation (2.30): The term l^{}^{vj}
should read l^{}^{vt}
(thanks to Vikas Sindhwani). 
 p. 63, line 11: "... is tighter for less sparse solutions." should
read "... is tighter for more sparse solutions." (thanks to Diego Andres
Alvarez Marin). 
 p. 77, Example 3.5: "P_{X}=Binomial(n,p)"
should read "P_{XP=p}=Binomial(n,p)"
(thanks to Jaz Kandola). 
 p. 84, line 5: "C(x,x)
= áx,xñ
+ _{t}^{2}I_{x¹x}=C(x,x)
= k(x,x)
+ _{t}^{2}I_{x¹x}"
should read "C(x,x)
= áx,xñ
+ _{t}^{2}I_{x=x}=C(x,x)
= k(x,x)
+ _{t}^{2}I_{x=x}"
(thanks to Arthur Gretton). 
 p. 86, line 17: "... on the found local maximum." should read "... on
the local maximum found." (thanks to Thore Graepel). 
 p. 86, line 20: "...is of the ability of ..." should read "...is the
ability of ..." (thanks to Thore Graepel). 
 p. 87, Figure 3.2: "... of the 6 observations ..." should read "... of
the 7 observations ...". Furthermore, "This local maxima ..." should read
"This local maximum ..." (thanks to Thore Graepel). 
 p. 87, line 12: "
R^{+}" should read "
(R^{+})^{N}". 
 p. 92, Figure 3.5: "... this liklihood is not normalizable." should
read "... this likelihood is not normalizable." (thanks to Neil Lawrence). 
 p. 99, line 7: "... out be the single weight vector ..." should read
"... out by the single weight vector ..." (thanks to Matthias Heiler). 
 p. 101, line 5: "... we first run learning algorithm to find ..."
should read "... we first run a learning algorithm to find ..." (thanks to
Matthias Heiler). 
 p. 110, line 11: Closing parenthesis at MacKay (1998) missing. 
 p. 110, line 6: Closing parenthesis at Barber and Williams (1997)
missing. 
 p. 122, equation (4.7): The subscript should read "R_{emp}[h,z]
=0" instead of "R_{emp}[h] =0" (thanks to Petra
Philips). 
 p. 123, line 13: "... realvalued loss functions conceptually similar
..." should read "... realvalued loss functions is conceptually similar
..." (thanks to Petra Philips). 
 p. 123, line 8: "... for all
(0,1], and all training samples sizes m, with probability at least
1 over the random draw of the training sample z Z^{m
}we have ..." should read "... for the zeroone loss l_{01
}given by equation (2.10) and all
> 0 we have ..." as in Theorem 4.7 (thanks to Simon Hill). 
 p. 124, line 10: "... (see equation (4.7))
..." should read "... (see equation (4.8)) ..." (thanks to Jürgen
Schweiger). 
 p. 125, line 5: "It we denote the maximum number ..." should read "If
we denote the maximum number ..." (thanks to Petra Philips). 
 p. 126, enumeration 1: "If the function N_{H} fulfills
N_{H}(m)=2^{m} ..." should read "If the function
N_{H} fulfills N_{H}(2m)=2^{2m} ..."
(thanks to Petra Philips). 
 p. 127, line 2: "The best constants that can be achieved are 2 as a
coefficient of and 1 in the exponent of the exponential term,
respectively." should read "The best constants that can be achieved for
the coefficients of the exponent in the exponential term are 2 and 1,
respectively." (thanks to Jaz Kandola). 
 p. 131, line 3: "Clearly, all functions in h are monotonically
..." should read "Clearly, all functions in H are monotonically
..." (thanks to Petra Philips). 
 p. 135, line 1: In the statement, "R[h]" should read "R[h]
 R_{emp}[h,z] " (thanks to Petra
Philips). 
 p. 137, line 12: The function
is typed :N x R x [0,1]
rather than :R x [0,1] because it
depends on the training sample size m. Hence, in the next line,
(L((Z_{1},...,Z_{m}),h),)
should be replaced by (m,L((Z_{1},...,Z_{m}),h),).This
change also appears in line 18, 6 and 3 on page 137, line 2 on page 138,
line 17 on page 139 and line 7 on page 140 (thanks to Petra Philips!).

 p. 140, line 5: "... in Appendix C.4 that L(z,h)
= _{eff}(z) ..."
should read "... in Appendix C.4 that L(z,h) = v_{H}(z)
..." (thanks to Petra Philips). 
 p. 175, Figure 5.1: "... with (solid line) and without (dashed line)
..." should read "... with (dashed line) and without (solid line) ..."
(thanks to Dongwei Cao). 
 p. 182, Definition 5.20: The update function U maps from Y
x X x H
> H rather than Y x X x Y > H
as stated in the text. This changes the definition of an online
learning algorithm. As a consequence, the sentence preceding the displayed
equation ("... and the prediction of the current
hypothesis h_{j} H ...") changes to "... and the current hypothesis h_{j}
H ..." (thanks to Thore
Graepel!). 
 p. 183, first equation: Accordingly, "U(x,y,h(x))=h" changes to
"U(x,y,h)=h" (thanks to Thore Graepel). 
 p. 183, line 15: Similar to p. 177, line 5, "(compression function
C_{i})" should read "(compression function C_{i})". 
 p. 184, line8: The fourth point should read "If all training
examples are correctly classified, it outputs C and classifies according
to (5.18).". Furthermore, the last two sentences of this example are
wrong and should be deleted (thanks to Thore Graepel). 
 p. 187, equation (5.19): This is a definition (similar to (2.14) at
page 29) and not an equation. 
 p. 194, line 19: "preceeding ShaweTaylor and Williamson (1997,
p. 4)" should read "preceding ShaweTaylor and Williamson (1997)". 
 p. 210, line 5: "... we also write Y~Normal(,)."
should read "... we also write X~Normal(,)."
(thanks to Jian Huang). 
 p. 211, line 8: "... is a Gaussian measures, ..." should read "... are
Gaussian measures, ..." (thanks to Petra Philips). 
 p. 217, line 5: In the definition of ^{n}_{p
}the second case should read max_{i=1,...,n}
x_{i} <
¥ rather than max_{i=1,...,n}
x_{i} (thanks to Arthur Gretton). 
 p. 222, line 6: "... is the smallest number
> 0 such that ..." should read "...
is the smallest number 0 such
that ..." (thanks to Petra Philips). 
 p. 240, line 5: "a0" should read
"a0" because otherwise
exponentiation of 1+a/x with x invalidates the inequality
(thanks to Peter BollmannSdorra). 
 p. 257, line 6: The braces must not include the summation over t
(thanks to Vikas Sindhwani). 
 p. 282, line 3: In the statement on the lhs. P(A) should read
P_{Z}(A) (thanks to Petra Philips). 
 p. 283, Lemma C.2: There is a major flaw in the proof of this lemma.
At the bottom of this page, it is argued that Theorem A.116 proves that
the probability that a binomially distributed variable with mean of at
least exceeds a value of
m/2 is at least 1/2.
This is wrong. In order to prove this statement we need a different
theorem which is given in the postscript version/pdf
version
of the errata (thanks to Ulrich Kockelkorn, Mingrui Wu and Vu Ha for
pointing this out mistake and helping with the additional proof). Note
the Mingrui Wu has also provided an alternative
proof/alternative proof which makes the least assumptions. 
 p. 283, line 13 "... (z)
Ù _{z}(A(z))
= 0 if such a sets exists ..." should read "...
Ù
_{z}(A(z))
= 0 if such a sets exists ..." (thanks to Petra Philips). 
 p. 285, line 8: "... given in equation (22) and ..." should read "...
given in equation (2.11) and ..." (thanks to Petra Philips). 
 p. 302, Section C.8: The section heading should read "A PACBayesian
Margin Bound" rather than "A PACBayesian Marin Bound"
(thanks to Thore Graepel). 
 p. 303, line 4: In the numerator, the integration is up to
p rather than 2p
(thanks to John ShaweTaylor). 
 p. 340: The entry Bartlett and ShaweTaylor (1998) has the wrong year
and is identical to Bartlett and ShaweTaylor (1999). 
 p. 340: In the entry Bennett (1998) there is one extra "19" in the
paper title (thanks to Jaz Kandola). 
 p. 347: In the entry Lauritzen (1981) "t. n. thiele" should read "T.
N. Thiele" (thanks to Jaz Kandola). 
 p. 349: In the entry Neal (1997b) "Technical Report" is spelt twice
(thanks to Jaz Kandola). 
 p. 350: In the entry Robert (1994) "Ney York" should read "New York"
(thanks to Jaz Kandola). 
 p. 354: The entry Watkins (1998) contains (almost) the same content as
Watkins (2000) and will be eliminated in future editions. 
 p. 355: In the entry Williams and Seeger (2001) "nystrom" should read
"Nyström" (thanks to Jaz Kandola). 