Wednesday, November 03, 2010

An Attempt to understand Machine Learning (Part 1)

I still remember the excitement when I took "Theory of Automata" course in my undergrad and was introduced to Turing machine. Few days after digging a little, I read Church-Turing thesis which stated "Every algorithms can be expressed in terms of Turing machine". This was like saying ... well you have framework for understand everything you will eventually learn in your Masters or PhD. However, mapping everything to Turing machine kept on becoming difficult as I started doing research and particularly when I took Machine Learning course. Most people come up with different perception when they take Machine Learning or Artificial Intelligence course.

Before I begin let me make you aware that Machine "Learning" is not same as human "learning". Humans learn something is a way that is not fully understood by scientific community. Let's say two persons X and Y see a mango tree. X has never seen a mango tree before so he will create a mental picture of it as a medium sized tree with green leaves. Y on other hand will create a mental image of tree and associate it with a mental image of mango or may be even with other senses like "it taste like mango shake". Any further discussion between X and Y will be based on their mental images and not reality. Note: information in mental images is less than information in reality and the reasons for this information loss are:
- We cannot store or process all the information in the world (due to lack of 'infinite' focus)
- Everyone filters information from reality according to their past experiences or beliefs.

Hence, since it is virtually impossible to have two person with exactly same experiences, it is impossible to have two people with exactly same mental images or information. If we treat the mental image as an output or even intermediate variable that we use to spit out an output, there is no "common" theory/algorithm that will generate a common output for all humans for same problem. Hence, this huge diversity (which makes life interesting) makes humans difficult to understand.

Most scientists (at least the neurologist/psychiatrist/psychologist) use abstraction (i.e. weed out details and rely on central theme i.e. "tree") to specify a theory/algorithm that applies to all. Yet there is another problem with human learning, humans don't learn or process with objects but with patterns. Furthermore, humans use association (and not exhaustive search through all the patterns) to interpret a situation. This process of association is somewhat random or at least extremely diverse (i.e. every person associates object A to object B using different sets of patterns and these patterns change over a period of time). Hence, it is extremely difficult to play around with such diversity using a single unifying theory. So, machine learning is not about building a machine that will always pass the "Turing test" for all the humans and in all the situations (though that is the ultimate goal of Machine Learning). (Btw, if you don't know about Turing test, read

Having said that, it is also very important not to be too humble while defining what Machine Learning is, especially if you compare every machine learning situation to Turing machine. Remember, assumptions for Turing machine are (that are not necessarily applicable to all Machine Learning problems):
- Entire input is available at the beginning of computation.
- No state is maintained between execution of two (same or different) programs.
Especially interesting case where it is very difficult to define the problem in terms of Turing machine is Reinforcement Learning that we will study later. Also paper by Leeuwen and Wiedermann that discusses Turing machine in comparison to contemporary computing would be a nice read.

By now I made two key points (or mental images :)):
1. Machine Learning != Human Learning
2. Machine Learning != Turing machine

1. The Turing Machine Paradigm in Contemporary Computing - Leeuwen and Wiedermann.