Tuesday, August 17, 2010

Learning Probability via Octave

Probability is a subject which brings everyone sleepless nights. Octave, the MATLAB clone or for some MATLAB wanna-be . GNU Octave is a high-level language, primarily intended for numerical computations. The best way to learn octave is to take a difficult task (or goal) and then start using it.

Octave Basics
Octave can be installed using the following command in Ubuntu.
$ sudo apt-get install octave
Once installed octave can be invoked via the command.
$ octave
Its best to understand the modus operandi of Octave. Like lists are the primary data structures in LISP Language. In octave vectors are at the core.  Vectors are used to store information, manipulate information via vector algebra and display data still using 2D/3D graphics. Applications of octave are huge but we will use probability as a starting point and see how we can learn better probability using the computational and graphical features of octave.

Uniform Probability Distribution
Uncertainty of events can be modeled using concepts of probability. If a set of events is equally likely to occur then we say that the events have equal or uniform probability of happening.
The first step towards developing a mathematical model for probability is to map real-life events to random variables (in our case the random variables will take on values from real numbers). Thus we say that instead of using real-life events we will refer to values 1, 2, 3, 4, .... when talking about events (e.g., Sun rise, Sun Set etc.). Lets use the example of a dice having six faces each one marked with number 1, 2, 3, 4, 5 and 6 respectively. The events of getting a particular face after rolling the dice are getting face marked by either One, Two, Three, Four, Five or Six. Since we cannot say with certainty  which face will come up on  roll of  dice. We say that x which is a random variable which takes on values {1, 2, 3, 4, 5, 6} corresponding to the aforementioned events.
Now this information can be stored as a vector x = [1 2 3 4 5 6] and so lets see the corresponding command in octave by simply typing x=[1 2 3 4 5 6].
octave:1> x=[1 2 3 4 5 6]
x =

   1   2   3   4   5   6
The following values x = and followed by 1 2 3 4 5 6, shows the value stored in the vector x. This display can be avoided by placing a semi-colon (;) at the end of the command. Now that we have declared a variable named x with values lets see how octave stores it. This can be achieved anytime by writing the command whos on the command line of octave.
octave:2> whos
Variables in the current scope:

  Attr     Name        Size                         Bytes       Class
  ==== ====        ====                     =====  =====
                 ans         1x30                             30        char
                   x           1x6                               48        double
The size of variable x is 1x6 which means that its a vector (matrix) of size 1 row and 6 columns, with 48 bytes needed to store it while the class of variables in double.

Now lets see what is the probability of getting a particular face. Since all the faces of the dice have an equal probability of showing up hence the probability of getting a 1 is equal to the probability of getting 2, 3, 4, 5 or 6. Therefore in our model the probability given by p(x) = 1/6 {where 6 represents the total number of possibilities}. Now lets calculate the probability of the events using octave.
octave:3> px=1/6*ones(1,6)
px =

    0.16667    0.16667    0.16667    0.16667    0.16667    0.16667
Here in the above code we have used the function ones which helps to declare a vector of size 1x6 and initialize it with ones. We simply multiply it with 1/6 to get 0.16667 in all the elements. In order to check what is the probability of event 3 simply write the following command.
octave:4> px(3)
ans =  0.16667
Similarly for all the other events.  This was simple because we used the simplest possible probability model.

Normal Probability Distribution
Moving further lets move to a more "realistic" probability model. Normal Probability Distribution is a very practical to model most situations. Special thanks to Guillaume Riflet's website for the equations in this article.
 f(x) = \tfrac{1}{\sqrt{2\pi\sigma^2}}\; e^{ -\frac{(x-\mu)^2}{2\sigma^2} }
Trying out this equation in octave gives the following resulting. Now using the same number of events x=[1 2 3 4 5 6]. The mean (mu)=3.5 and variance(sigma)=1.
octave:5> mu=3.5; sigma=1;
octave:6>  fx=(1/sqrt(2*pi*sigma^2)*exp(-(x-mu).^2/(2*sigma^2)));
fx =

   0.017528   0.129518   0.352065   0.352065   0.129518   0.017528
Since the mean was selected to be 3.5 therefore we can see that the probability of 3 and 4 are maximum here while the probability decreases as we move away from 3.5 on either side. Now this is the classical "bell shaped" or Gaussian  curve. This does not look clear because we took only 6 data points to plot the curve we can improve the points by using the following code and the resulting graph is shown.
octave:8> x=[1:0.1:6];
octave:9>  fx=(1/sqrt(2*pi*sigma^2)*exp(-(x-mu).^2/(2*sigma^2)));
octave:10> plot(x,fx)

Now that the data points have been increased the graph is much more smooth and looks like a bell shaped curve and the probability is now really maximum at point 3.5 as should be since mu (mean)=3.5.

Multivariate Probability Distribution

In order to generate further interest now lets look at multivariate normal distribution. The following equation gives us the multivariate normal distribution here the x and mu are d-dimensional vectors defining the multidimensional event and means while sigma is the co-variance matrix.

p(x)=\frac{1}{(2\pi)^{k/2}|\Sigma|^{1/2}}\, e^{ -\frac{1}{2}(x-\mu)'\Sigma^{-1}(x-\mu) }
The above equation requires further constructs like for-loop to be implemented in Octave lets use the example given at wiki to learn.
octave:10> mu=[40, 60]; sigma=[100, 30; 30, 140]; octave:39> isigma = inv(sigma);
octave:11> detsigma = det(sigma);
octave:12> coeff = 1/(2*pi*sqrt(detsigma));
octave:13> for i=1:100
>    for j=1:100
>        x = [i;j];
>        xm = x - mu;
>        p(i,j) = exp(-0.5*xm'*isigma*xm);
>    end
> end
octave:14> p=p*coeff;
octave:15> [X,Y]=meshgrid(1:100,1:100);
octave:16> surf(X,Y,p);

The plot above is a 3-D graph of the multivariate normal probability distribution of p(x,y) depending on two variables x and y. The above methodology helps us learn probability in a fun manner as well as get familiar with octave's computational and graphical features.