 | p. xvii, line -3: "Since the book uses a very rigorous notation
systems, ..." should read "Since the book uses a very rigorous notation
system, ..." (thanks to Neil Lawrence). |
 | p. 8, line 5: "... reader should refer Sutton and Barto (1998) ..."
should read "... reader is referred to Sutton and Barto (1998) ..."
(thanks to Thore Graepel). |
 | p. 19, line -12: "... (see Definition A.39)." should read "... (see
Definition A.34)." (thanks to Marco Krüger). |
 | p. 33, line 15: "x Є K"
should read "x Є X"
(thanks to Arthur Gretton). |
 | p. 33, line -3: "... where U=(u'1;...;u'r)
is ..." should read "... where U=(u1,...,un)=(v'1;...;v'r)
is ...". Accordingly, in page 34, line 2 and 4 each u is replaced
by a v. Furthermore, the subscript K in line on page 34 is
omitted (thanks to Malte Kuss). |
 | p. 34, line 5: "... and a mapping L
into it ..." should read "...
and a mapping f
into it ..." (thanks to Petra
Philips). |
 | p. 34, line 9: "the nth mapped object
xn is" should read "the point
∑i Ui,nf(xi)=L½U'un
is". Accordingly, the following line changes to "||L½U'un||2=un'ULU'un=en'Len=ln<0"
(thanks to Malte Kuss). |
 | p. 42, equation (2.26): The second case term should read "kr(u,v)
+ Σj=1|v| l2·kr'(u1u,v[j:|v|])"
(thanks to Michael Davy). |
 | p. 43, equation (2.30): The term l|v|-j
should read l|v|-t
(thanks to Vikas Sindhwani). |
 | p. 63, line -11: "... is tighter for less sparse solutions." should
read "... is tighter for more sparse solutions." (thanks to Diego Andres
Alvarez Marin). |
 | p. 77, Example 3.5: "PX=Binomial(n,p)"
should read "PX|P=p=Binomial(n,p)"
(thanks to Jaz Kandola). |
 | p. 84, line -5: "C(x,x)
= áx,xñ
+ σt2Ix¹x=C(x,x)
= k(x,x)
+ σt2Ix¹x"
should read "C(x,x)
= áx,xñ
+ σt2Ix=x=C(x,x)
= k(x,x)
+ σt2Ix=x"
(thanks to Arthur Gretton). |
 | p. 86, line 17: "... on the found local maximum." should read "... on
the local maximum found." (thanks to Thore Graepel). |
 | p. 86, line 20: "...is of the ability of ..." should read "...is the
ability of ..." (thanks to Thore Graepel). |
 | p. 87, Figure 3.2: "... of the 6 observations ..." should read "... of
the 7 observations ...". Furthermore, "This local maxima ..." should read
"This local maximum ..." (thanks to Thore Graepel). |
 | p. 87, line -12: "σ Є
R+" should read "σ Є
(R+)N". |
 | p. 92, Figure 3.5: "... this liklihood is not normalizable." should
read "... this likelihood is not normalizable." (thanks to Neil Lawrence). |
 | p. 99, line -7: "... out be the single weight vector ..." should read
"... out by the single weight vector ..." (thanks to Matthias Heiler). |
 | p. 101, line 5: "... we first run learning algorithm to find ..."
should read "... we first run a learning algorithm to find ..." (thanks to
Matthias Heiler). |
 | p. 110, line -11: Closing parenthesis at MacKay (1998) missing. |
 | p. 110, line -6: Closing parenthesis at Barber and Williams (1997)
missing. |
 | p. 122, equation (4.7): The subscript should read "Remp[h,z]
=0" instead of "Remp[h] =0" (thanks to Petra
Philips). |
 | p. 123, line 13: "... real-valued loss functions conceptually similar
..." should read "... real-valued loss functions is conceptually similar
..." (thanks to Petra Philips). |
 | p. 123, line -8: "... for all δ Є
(0,1], and all training samples sizes m, with probability at least
1-δ over the random draw of the training sample z Є Zm
we have ..." should read "... for the zero-one loss l0-1
given by equation (2.10) and all ε
> 0 we have ..." as in Theorem 4.7 (thanks to Simon Hill). |
 | p. 124, line 10: "... (see equation (4.7))
..." should read "... (see equation (4.8)) ..." (thanks to Jürgen
Schweiger). |
 | p. 125, line 5: "It we denote the maximum number ..." should read "If
we denote the maximum number ..." (thanks to Petra Philips). |
 | p. 126, enumeration 1: "If the function NH fulfills
NH(m)=2m ..." should read "If the function
NH fulfills NH(2m)=22m ..."
(thanks to Petra Philips). |
 | p. 127, line 2: "The best constants that can be achieved are 2 as a
coefficient of and 1 in the exponent of the exponential term,
respectively." should read "The best constants that can be achieved for
the coefficients of the exponent in the exponential term are 2 and 1,
respectively." (thanks to Jaz Kandola). |
 | p. 131, line 3: "Clearly, all functions in h are monotonically
..." should read "Clearly, all functions in H are monotonically
..." (thanks to Petra Philips). |
 | p. 135, line -1: In the statement, "R[h]" should read "R[h]
- Remp[h,z] " (thanks to Petra
Philips). |
 | p. 137, line 12: The function ω
is typed ω:N x R x [0,1]
rather than ω:R x [0,1] because it
depends on the training sample size m. Hence, in the next line,
ω(L((Z1,...,Zm),h),δ)
should be replaced by ω(m,L((Z1,...,Zm),h),δ).This
change also appears in line 18, -6 and -3 on page 137, line 2 on page 138,
line 17 on page 139 and line 7 on page 140 (thanks to Petra Philips!).
|
 | p. 140, line 5: "... in Appendix C.4 that L(z,h)
= -νeff(z) ..."
should read "... in Appendix C.4 that L(z,h) = -vH(z)
..." (thanks to Petra Philips). |
 | p. 175, Figure 5.1: "... with (solid line) and without (dashed line)
..." should read "... with (dashed line) and without (solid line) ..."
(thanks to Dongwei Cao). |
 | p. 182, Definition 5.20: The update function U maps from Y
x X x H
-> H rather than Y x X x Y -> H
as stated in the text. This changes the definition of an online
learning algorithm. As a consequence, the sentence preceding the displayed
equation ("... and the prediction of the current
hypothesis hj Є H ...") changes to "... and the current hypothesis hj
Є H ..." (thanks to Thore
Graepel!). |
 | p. 183, first equation: Accordingly, "U(x,y,h(x))=h" changes to
"U(x,y,h)=h" (thanks to Thore Graepel). |
 | p. 183, line 15: Similar to p. 177, line -5, "(compression function
Ci)" should read "(compression function C|i|)". |
 | p. 184, line-8: The fourth point should read "If all training
examples are correctly classified, it outputs C and classifies according
to (5.18).". Furthermore, the last two sentences of this example are
wrong and should be deleted (thanks to Thore Graepel). |
 | p. 187, equation (5.19): This is a definition (similar to (2.14) at
page 29) and not an equation. |
 | p. 194, line 19: "preceeding Shawe-Taylor and Williamson (1997,
p. 4)" should read "preceding Shawe-Taylor and Williamson (1997)". |
 | p. 210, line -5: "... we also write Y~Normal(μ,Σ)."
should read "... we also write X~Normal(μ,Σ)."
(thanks to Jian Huang). |
 | p. 211, line -8: "... is a Gaussian measures, ..." should read "... are
Gaussian measures, ..." (thanks to Petra Philips). |
 | p. 217, line 5: In the definition of ℓnp
the second case should read maxi=1,...,n
|xi| <
¥ rather than maxi=1,...,n
|xi| (thanks to Arthur Gretton). |
 | p. 222, line 6: "... is the smallest number
ε > 0 such that ..." should read "...
is the smallest number ε ≥ 0 such
that ..." (thanks to Petra Philips). |
 | p. 240, line -5: "a≠0" should read
"a≥0" because otherwise
exponentiation of 1+a/x with x invalidates the inequality
(thanks to Peter Bollmann-Sdorra). |
 | p. 257, line 6: The braces must not include the summation over t
(thanks to Vikas Sindhwani). |
 | p. 282, line 3: In the statement on the lhs. P(A) should read
PZ(A) (thanks to Petra Philips). |
 | p. 283, Lemma C.2: There is a major flaw in the proof of this lemma.
At the bottom of this page, it is argued that Theorem A.116 proves that
the probability that a binomially distributed variable with mean of at
least ε exceeds a value of
mε/2 is at least 1/2.
This is wrong. In order to prove this statement we need a different
theorem which is given in the postscript version/pdf
version
of the errata (thanks to Ulrich Kockelkorn, Mingrui Wu and Vu Ha for
pointing this out mistake and helping with the additional proof). Note
the Mingrui Wu has also provided an alternative
proof/alternative proof which makes the least assumptions. |
 | p. 283, line 13 "... ε(z)
Ù νz(A(z))
= 0 if such a sets exists ..." should read "...
ε Ù
νz(A(z))
= 0 if such a sets exists ..." (thanks to Petra Philips). |
 | p. 285, line -8: "... given in equation (22) and ..." should read "...
given in equation (2.11) and ..." (thanks to Petra Philips). |
 | p. 302, Section C.8: The section heading should read "A PAC-Bayesian
Margin Bound" rather than "A PAC-Bayesian Marin Bound"
(thanks to Thore Graepel). |
 | p. 303, line 4: In the numerator, the integration is up to
p rather than 2p
(thanks to John Shawe-Taylor). |
 | p. 340: The entry Bartlett and Shawe-Taylor (1998) has the wrong year
and is identical to Bartlett and Shawe-Taylor (1999). |
 | p. 340: In the entry Bennett (1998) there is one extra "19" in the
paper title (thanks to Jaz Kandola). |
 | p. 347: In the entry Lauritzen (1981) "t. n. thiele" should read "T.
N. Thiele" (thanks to Jaz Kandola). |
 | p. 349: In the entry Neal (1997b) "Technical Report" is spelt twice
(thanks to Jaz Kandola). |
 | p. 350: In the entry Robert (1994) "Ney York" should read "New York"
(thanks to Jaz Kandola). |
 | p. 354: The entry Watkins (1998) contains (almost) the same content as
Watkins (2000) and will be eliminated in future editions. |
 | p. 355: In the entry Williams and Seeger (2001) "nystrom" should read
"Nyström" (thanks to Jaz Kandola). |