Fisseha Berhane, PhD

Data Scientist

443-970-2353 fisseha@jhu.edu CV Resume Linkedin GitHub twitter twitter

Logistic Regression

In this post, I will show how to implement logistic regression with Matlab. We will use logistic regression to predict whether a student will be admitted to a university. The data is from the Machine Learning course on Coursera. I did this as an assignment in that course. So, if you are taking that course, you are advised not to copy from this page.

Load Data

The first two columns contains the exam scores and the third column contains the label.

Assume the data is historical data from previous applicants that we can use as a training set for logistic regression. For each training example, we have the applicant's scores on two exams and the admissions decision. Here, we will find the probability of admission of an applicant based on their scores on the two exams considered.

In [16]:
data = load('ex2data1.txt');
X = data(:, [1, 2]); y = data(:, 3);
Plotting

We start the exercise by first plotting the data to understand the the problem we are working with.

In [17]:
% Find Indices of Positive and Negative Examples
pos = find(y==1); neg = find(y == 0);
% Plot Examples
plot(X(pos, 1), X(pos, 2), 'k+','LineWidth', 2, ...
'MarkerSize', 7);

hold on

plot(X(neg, 1), X(neg, 2), 'ko', 'MarkerFaceColor', 'y', ...
'MarkerSize', 7);
legend('Admitted','Not admitted' ,'Location','northoutside','Orientation','horizontal')
legend('boxoff')
xlabel('Exam 1 score','FontSize',8)
ylabel('Exam 2 score','FontSize',8)

The logistic regression hypothesis is defined as:

$h_\theta(x) =g(\theta^Tx)$

where function g is the sigmoid function. The sigmoid function is defined as

$g(z) = \frac{1}{1+e^{-z}}$

The function below computes the sigmoid of each value of z (z can be a matrix, vector or scalar).

In [ ]:
function g = sigmoid(z)

g = zeros(size(z));

for i=1:size(z,1);
    for j=1:size(z,2);
        g(i,j)=1./(1+exp(-z(i,j)));
    end
end
end
Compute Cost and Gradient

The cost function in logistic regression is given by

$J(\theta)=\frac{1}{m}\sum\limits_{i=1}^{m} [-y^{(i)}log(h_\theta(x^{(i)}))-(1-y^{(i)})log(1-h_\theta(x^{(i)}))]$

and the gradient of the cost is a vector of the same length as $\theta$ where the $j^{th}$ element (for j=0,1,...,n) is defined as follows:

$\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m}\sum\limits_{i=1}^{m} (h_\theta(x^{(i)})-y^{(i)})x^{(i)}_j $

The function below calculates cost and gradient of the cost

In [ ]:
function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
%   J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
%   parameter for logistic regression and the gradient of the cost
%   w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples

J = 0;
grad = zeros(size(theta));


J=1./m*sum((-y'.*log(1./(1+exp(-theta'*X'))))-((1-y)'.*log(1-1./(1+exp(-theta'*X')))));



grad=1./m*(1./(1+exp(-theta'*X')) -y')*X;


end
In [18]:
%  Setup the data matrix appropriately, and add ones for the intercept term
[m, n] = size(X);

% Add intercept term to x and X_test
X = [ones(m, 1) X];

% Initialize fitting parameters
initial_theta = zeros(n + 1, 1);

% Compute and display initial cost and gradient
[cost, grad] = costFunction(initial_theta, X, y);

fprintf('Cost at initial theta (zeros): %f\n', cost);
fprintf('Gradient at initial theta (zeros): \n');
fprintf(' %f \n', grad);
Cost at initial theta (zeros): 0.693147
Gradient at initial theta (zeros): 
 -0.100000 
 -12.009217 
 -11.262842
Optimizing using fminunc

Let's use fminunc to find the optimal parameters theta.

In [19]:
%  Set options for fminunc
options = optimset('GradObj', 'on', 'MaxIter', 400);

%  Run fminunc to obtain the optimal theta
%  This function will return theta and the cost 
[theta, cost] = ...
fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);

% Print theta to screen
fprintf('Cost at theta found by fminunc: %f\n', cost);
fprintf('theta: \n');
fprintf(' %f \n', theta);
Local minimum possible.

fminunc stopped because the final change in function value relative to 
its initial value is less than the default value of the function tolerance.



Cost at theta found by fminunc: 0.203506
theta: 
 -24.932995 
 0.204408 
 0.199618
Plot Boundary

The function below helps us to plot the boundary

In [ ]:
function plotDecisionBoundary(theta, X, y)
%PLOTDECISIONBOUNDARY Plots the data points X and y into a new figure with
%the decision boundary defined by theta
%   PLOTDECISIONBOUNDARY(theta, X,y) plots the data points with + for the 
%   positive examples and o for the negative examples. X is assumed to be 
%   a either 
%   1) Mx3 matrix, where the first column is an all-ones column for the 
%      intercept.
%   2) MxN, N>3 matrix, where the first column is all-ones

% Plot Data
plotData(X(:,2:3), y);
hold on

if size(X, 2) <= 3
    % Only need 2 points to define a line, so choose two endpoints
    plot_x = [min(X(:,2))-2,  max(X(:,2))+2];

    % Calculate the decision boundary line
    plot_y = (-1./theta(3)).*(theta(2).*plot_x + theta(1));

    % Plot, and adjust axes for better viewing
    plot(plot_x, plot_y)
    
    % Legend, specific for the exercise
    legend('Admitted', 'Not admitted', 'Decision Boundary')
    axis([30, 100, 30, 100])
else
    % Here is the grid range
    u = linspace(-1, 1.5, 50);
    v = linspace(-1, 1.5, 50);

    z = zeros(length(u), length(v));
    % Evaluate z = theta*x over the grid
    for i = 1:length(u)
        for j = 1:length(v)
            z(i,j) = mapFeature(u(i), v(j))*theta;
        end
    end
    z = z'; % important to transpose z before calling contour

    % Plot z = 0
    % Notice you need to specify the range [0, 0]
    contour(u, v, z, [0, 0], 'LineWidth', 2)
end
hold off

end
In [20]:
plotDecisionBoundary(theta, X, y);

% Put some labels 
hold on;
% Labels and Legend
xlabel('Exam 1 score','FontSize',8)
ylabel('Exam 2 score','FontSize',8)

% Specified in plot order
legend('Admitted','Not admitted' ,'Location','northoutside','Orientation','horizontal')
Make Predictions and calculate accuracies

After learning the parameters, let's use it to predict the outcomes on unseen data. In this part, I will use the logistic regression model to predict the probability that a student with score 45 on exam 1 and score 85 on exam 2 will be admitted. Furthermore, we will compute the training and test set accuracies of our model.

In [21]:
prob = sigmoid([1 45 85] * theta);
fprintf(['For a student with scores 45 and 85, we predict an admission ' ...
         'probability of %f\n\n'], prob);
For a student with scores 45 and 85, we predict an admission probability of 0.774323

The function below predicts whether the label is 0 or 1 using learned logistic regression parameters theta.

In [ ]:
function p = predict(theta, X)
%   p = PREDICT(theta, X) computes the predictions for X using a 
%   threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)

m = size(X, 1); % Number of training examples


p = zeros(m, 1);


for i=1:m;
    
    c = -theta'*X(i,:)';
    p(i) = 1./(1.+exp(c));
    if p(i)>= 0.5; p(i)=1; else p(i)=0; end
    
end

end
In [22]:
% Compute accuracy on our training set
p = predict(theta, X);

fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
Train Accuracy: 89.000000

Summary

In this post, we used logistic regression to predict whether a student with given exam scores in two exams will be admitted to a university or not. The accuracy of our model is 89%. Of course, we have to calculate other measures, such as sensitivity, specificity and AUC, to have a better understanding of the performance of our model. We used fminunc to find the optimal values of theta. In our next post, we will see applications of regularization to logistic regression.

comments powered by Disqus