Showing posts with label Study Notes. Show all posts
Showing posts with label Study Notes. Show all posts

Thursday, March 14, 2013

Video tutorial on reinforcement learning

This video tutorial is based on a talk I gave at my Comp 650 class and hence assumes that the audience knows a little about Markov Decision Processes and Partially Observable Markov Decision Processes. If you are unfamiliar with them, I recommend that you at least read through their wikipedia pages before going through this tutorial. Also, I suggest that you view this video on youtube, because the embedded video given below is cropped.


If you have any suggestions, comments or questions, please feel free to email me :)

PDF slides

Tuesday, October 11, 2011

How to analyze algorithms ?

Important assumptions:

  •  This process of analysis should be independent of programming language and machine architecture. So you can't just write your algorithm in C++ and compare it using running time of some other algorithm written in Java. 
  • Also, since each algorithm can take as an input arbitrary data structure (eg: graph, array, binary object, tree, ...), there has to be some standard way to specify it. For sake of our analysis, we use input size (n) as the measure for our analysis. For eg: length of array, number of vertices or edges of the graph, number of nodes of a tree etc.
The detailed explanations of these two points is given in Introduction to Algorithms by Cormen, Leiserson, Rivest, .. (CLR)

Setup:

Input of size n ==> Algorithm that takes T(n) steps ==> some output

The number of steps in the algorithm often depends on factors other than the input size. For eg: the number of steps for quicksort for the input [1, 2, 3, 4, 5] is different than as compared to the input [5, 4, 3, 2, 1]; even though both have same input size (and in this case same elements).
Hence, we will try to find 3 functions (upper bound, lower bound and Theta bound) that can capture the effects of these factors: $$ g_1(n), g_2(n) \text{ and } g_3(n) \text{ such that } $$ $$ T(n) \in O(g_1(n)), T(n) \in \Omega(g_2(n)) \text{ and } T(n) \in \Theta(g_3(n))$$ Many authors also write above equations as: $$ T(n) = O(g_1(n)), T(n) = \Omega(g_2(n)) \text{ and } T(n) = \Theta(g_3(n))$$

Big-O notations:

Now, let's understand definition of above notations: \begin{align} \text{To prove that } & g_1(n) \text{ is upper bound of T(n), i.e.} T(n) \in O(g_1(n)) \\ \text{ find two constants } & c, n_0 \text{ such that } \\ & \forall n \geq n_0, T(n) \leq c g_1(n) \end{align} For lower bound, change ≤ sign to ≥ sign and for theta bound find both lower and upper bound: $$ T(n) \in \Theta(g_3(n)) \text{ iff } T(n) \in O(g_3(n)) \text{ and } T(n) \in \Omega(g_3(n))$$
To understand these in more details, I suggest you solve problems from CLR chapter 3. Here are some example you can start with: \begin{align} 2n^2 \in O(n^3) & \text{ with } c=1 \text{ and } n_0 = 2 \\ \sqrt{n} \in \Omega(lg n) & \text{ with } c=1 \text{ and } n_0 = 16 \\ \frac{n^2}{2} - 2n \in \Theta(n^2) & \text{ with } c_1=\frac{1}{4}, c_2=\frac{1}{2} \text{ and } n_0 = 8 \\ \end{align} (Note: theoretically very large functions such as infinity^n or n^infinity can always be lower bounds. So, it is necessary to supply as tight bound as possible for any given algorithm. Mathematically, the notations for tight upper bound is small o and for tight lower bound is small omega, but we will ignore them for sake of this discussion)

Common measures for algorithms:

Above three notations are often replaced by the terms 'worst case', 'best case' and 'average case' complexity. But, more precise definitions are as follows: \begin{align} \text{Worst case}& T(n) = max \{ T(x) | \text{ x is an instance of size n} \} \\ \text{Best case}& T(n) = min \{ T(x) | \text{x is an instance of size n} \} \\ \text{Average case}& T(n) = \sum_{\text{input }x} Prob(x) T(x) \end{align} This means, to find average case complexity of algorithm, you need to find probability of a given input and use it to find expectation of T(n).

Pseudo code to Big-O:

Now, let's address the key question: given a pseudo code, how do we find its lower bound (or upper bound or ...). Here is one possible way to find those bounds:
  • First, express the algorithm in recursive form
  • Determine appropriate metric for input size 'n'
  • Then, find recurrence relation for the algorithm
  • Solve the recurrence relation using either Substitution method, recursion tree method or the master theorem

For example, the recurrence relations for popular algorithms are: \begin{align} \text{Binary search: } & T(n) & = T(\frac{n}{2}) + O(1) \\ \text{Sequential search: } & T(n) & = T(n-1) + O(1) \\ \text{Tree traversal: } & T(n) & = 2T(\frac{n}{2}) + O(1) \\ \text{Selection sort: } & T(n) & = T(n-1) + O(n) \\ \text{Merge sort: } & T(n) & = 2T(\frac{n}{2}) + O(n) \end{align} Try writing the pseudo code for any of the above algorithm and verify the recurrence relations.

Master theorem:

Master theorem can only solve the recurrence relation of the form: T(n) = a T(n/b) + f(n). Here is my modus operandi for master theorem: \begin{align} \text{If } f(n)=n^{log_{b}a}lg^{k+1}n, & \text{ then } T(n) = \Theta(n^{log_{b}a} lg^{k+1}n) \\ \text{Else if } n^{log_{b}a} \text{ is polynomially greater than } f(n), & \text{ then } T(n) = \Theta(n^{log_{b}a}) \\ \text{Else if } f(n) \text{ is polynomially greater than } n^{log_{b}a}, & \text{ then } T(n) = \Theta(f(n)) \\ \text{Else use Substitution method}& \end{align}
Here are solutions to two important recurrence relations, that cannot be solved using Master theorem: \begin{align} \text{Recurrence relation} & & \text{Complexity} \\ T(n) = T(n-1) + b n^k & => & O(n^{k+1})\\ T(n) = cT(n-1) + b n^k & => & O(c^n) \end{align}
For recurrence relation of other forms, refer to CLR for substitution and recursion tree method.
(In short, for substitution method, you guess the solution and use induction to find the constants and to prove that the guessed solution is correct.
Recursion tree method are often used to find the guess for the substitution method)

Some useful approximations:

For non-negative real b,c,d:
\begin{align} \text{Polynomial growth rate:}& \sum_{i=1}^n i^c &= \Theta(n^{c+1}) \\ \text{Logarithmic growth rate:}& \sum_{i=1}^n \dfrac{1}{i} &= \Theta(log(n)) \\ \text{Exponential growth rate:}& \sum_{i=1}^n c^i &= \Theta(c^n) \\ \text{n-Log(n) growth rate:}& \sum_{i=1}^n log(i)^c &= \Theta(n log(n)^c) \\ \text{Combination of above:}& \sum_{i=1}^n log(i)^c i^{d} &= \Theta(n^{d+1} log(n)^c) \\ &\sum_{i=1}^n log(i)^c i^d b^{i} &= \Theta(n^{d+1} log(n)^c b^{n}) \\ \text{Stirling's approximation:}& n! &= \sqrt(2 \pi n) (\frac{n}{e})^n (1 + \Theta(\frac{1}{n})) \end{align}

Wednesday, November 03, 2010

An Attempt to understand Machine Learning (Part 1)

I still remember the excitement when I took "Theory of Automata" course in my undergrad and was introduced to Turing machine. Few days after digging a little, I read Church-Turing thesis which stated "Every algorithms can be expressed in terms of Turing machine". This was like saying ... well you have framework for understand everything you will eventually learn in your Masters or PhD. However, mapping everything to Turing machine kept on becoming difficult as I started doing research and particularly when I took Machine Learning course. Most people come up with different perception when they take Machine Learning or Artificial Intelligence course.

Before I begin let me make you aware that Machine "Learning" is not same as human "learning". Humans learn something is a way that is not fully understood by scientific community. Let's say two persons X and Y see a mango tree. X has never seen a mango tree before so he will create a mental picture of it as a medium sized tree with green leaves. Y on other hand will create a mental image of tree and associate it with a mental image of mango or may be even with other senses like "it taste like mango shake". Any further discussion between X and Y will be based on their mental images and not reality. Note: information in mental images is less than information in reality and the reasons for this information loss are:
- We cannot store or process all the information in the world (due to lack of 'infinite' focus)
- Everyone filters information from reality according to their past experiences or beliefs.

Hence, since it is virtually impossible to have two person with exactly same experiences, it is impossible to have two people with exactly same mental images or information. If we treat the mental image as an output or even intermediate variable that we use to spit out an output, there is no "common" theory/algorithm that will generate a common output for all humans for same problem. Hence, this huge diversity (which makes life interesting) makes humans difficult to understand.

Most scientists (at least the neurologist/psychiatrist/psychologist) use abstraction (i.e. weed out details and rely on central theme i.e. "tree") to specify a theory/algorithm that applies to all. Yet there is another problem with human learning, humans don't learn or process with objects but with patterns. Furthermore, humans use association (and not exhaustive search through all the patterns) to interpret a situation. This process of association is somewhat random or at least extremely diverse (i.e. every person associates object A to object B using different sets of patterns and these patterns change over a period of time). Hence, it is extremely difficult to play around with such diversity using a single unifying theory. So, machine learning is not about building a machine that will always pass the "Turing test" for all the humans and in all the situations (though that is the ultimate goal of Machine Learning). (Btw, if you don't know about Turing test, read http://en.wikipedia.org/wiki/Turing_test.)

Having said that, it is also very important not to be too humble while defining what Machine Learning is, especially if you compare every machine learning situation to Turing machine. Remember, assumptions for Turing machine are (that are not necessarily applicable to all Machine Learning problems):
- Entire input is available at the beginning of computation.
- No state is maintained between execution of two (same or different) programs.
Especially interesting case where it is very difficult to define the problem in terms of Turing machine is Reinforcement Learning that we will study later. Also paper by Leeuwen and Wiedermann that discusses Turing machine in comparison to contemporary computing would be a nice read.

By now I made two key points (or mental images :)):
1. Machine Learning != Human Learning
2. Machine Learning != Turing machine

References:
1. The Turing Machine Paradigm in Contemporary Computing - Leeuwen and Wiedermann.