i'm trying implement logistic regression algorithm, , part of matlab code follows.
for = 1 : max_itr % calculate mu mu = 1.0 ./ (1.0 + exp(-(x * w))); % calculate h check convergence h(i) = sum(-y .* log(mu) - (1 - y) .* log(1 - mu)) + (lambda / 2) * norm(w([2:end]))^2; % calculate gradient , hessian. g = lambda .* w; g(1) = 0; % set term gradient 0 l = lambda .* eye(d + 1); l(1) = 0; % set term hessian 0 grad = (x' * (mu - y)) + g; s = diag((mu .* (1 - mu))); h = (x' * s * x) + l; % update w w = w - h\grad; end
obviously value of mu
cannot 1, since exponential cannot 0. however, there values exponential evaluates small value, such 1.6629e-05
. causes mu
value instance close 1, i.e. 0.999983371689452.
i've ran code iteration iteration, , first 4 iterations fine mu
not contain such "close-to-1" value. however, fifth iteration does, , result, nan
h
, , algorithm won't converge.
i've ran digits
command , value 32, don't know causing problem.
eta: updated code after @rayryeng's suggestion: d
number of features in x
, m
number of training samples x
= 1 : max_itr % initialize arrays grad = zeros(d+1,1); h(i) = 0; h = zeros(d+1,d+1); j = 1 : m % calculate mu mu = sigmoid(x(j,:) * w); % calculate h (to check convergence) h(i) = h(i) - (1/m)*(y(j) * log(mu) + (1 - y(j)) * log(1 - mu)) + (lambda / (2 * m)) * norm(w(2:end))^2; % calculate gradient , hessian g = lambda * w; g(1) = 0; % set term gradient 0 l = lambda * eye(numfeatures + 1); l(1) = 0; % set term hessian 0 grad = grad - (1/m) * ((x(j,:)' * (mu - y(j))) + g); s = diag((mu .* (1 - mu))); h = h - (1/m) * ((x(j,:)' * s * x(j,:)) + l); end %fprintf('h(%d) = %0.5f\n', i, h(i)); % update w w = w - h\grad; end
i don't nan
error anymore, , h
values seem converge after few iterations.
in implementation,
x
input datam x n
(a column of1
s added initial datax = [ones(m, 1) x];
)y
output data.- the initial values thetas 0
the sigmoid function defined follows:
function g = sigmoid(z) g = 1./(1+exp(-z)); end
the cost function defined follows:
function [j, grad] = costfunction(theta, x, y) m = length(y); j = (1/m)*sum(-y .* log(sigmoid(x*theta)) - (1-y) .* log(1-sigmoid(x*theta))); grad = (x'*(sigmoid(x*theta)-y))/m; end
in main code, use fminunc
function find optimal theta.
options = optimset('gradobj', 'on', 'maxiter', 400); [theta, cost] = fminunc(@(t)(costfunction(t, x, y)), initial_theta, options);
at end different implementation, might useful find out solution problem.
Comments
Post a Comment