Monday, August 15, 2011

Thinking in Probability

This blogpost is based on a lecture I gave at Jacob Sir's classes yesterday. The intent of my lecture was to urge student to ask questions and read "stuff" beyond the textbook. To get the interest of the students, I started with 2 examples, which explained the dependent and independent events and also the need to stick to mathematical rigor rather than intuition.

First example was classic Monte Hall problem, which suprisingly none of my students had heard about. This was fortunate because this meant everyone would have to think about it on their own, rather than provide a prepared answer (thought-through by someone else). So, here is the problem:

There are three doors D1, D2 and D3; behind two of them there is a goat and behind the other is a car. The objective of the game is to win a car. So, you have to guess which door to chose ... Does it make difference whether you chose D1, D2 or D3 ? ... The consensus was NO, because the Prob(win)=0.33 for each of the door. To make the problem interesting the game show host (who know which door has car and which doors has goats), opens one of the remaining door that has goat. He now asks you based on this information whether you would like to switch your earlier choice. For example, earlier you chose D1 and the game show host opens D2 and shows that it has goat. Now you can either stick to D1 or switch to D3. The real question we are interested in is:
Does switching the door make more sense or staying with the same door ? or It does not matter whether you chose D1 or D3.

Interesting but incorrect answers:
1. It does not matter whether you chose D1 or D3, because Prob(win) for each door is now 0.5
2. Staying with the door makes more sense, since we have increased the Prob(win) from 0.3 to 0.5
3. We really don't have plausible reason to switch, hence stay with the same door; especially since the game show host may want to trick us.

The correct answer is switching doubles the Prob(win), hence it makes more sense.
Consider the case where you don't switch:
Prob(win) = Prob(D1 = car) = 0.33 (The probability does not change because of an event outside its scope)
Now consider the case where you switch:
Prob(win) = Prob(D1 = goat) = 0.66

The other way to understand the problem is by enumerating
D1 : C G G
D2 : G C G
D3 : G G C
Swap: G C C => Prob(car) = 0.66
Stay: C G G => Prob(car) = 0.33

For people who chose the third incorrect answer, I asked them not to use information not provided in the problem and show as much restrain as possible to use intuition over logic. Deductive thinking says go from Step 2 to Step 3 only if there is a valid theorem, axiom, or logical reason (not intuition).

Let's move to the second problem:
Say I have a fair coin. I toss the coin four times and I get {Head, Head, Head, Head}, what will you bet on ?
Some students earlier based their answer on gambler's fallacy, stating that Tail is more likely since they have seen four consecutive heads. I then asked them to sit in circle and discuss with each other and then answer my question again. Through this exercise, I wanted them to learn the idea of collaboration to get an answer. And vola ... they came with the correct answer ... It doesnot matter whether we bet on head or tail, since the events are INDEPENDENT and hence each outcome is equally likely => Prob(T5 | H1, H2, H3, H4) = Prob(T5) = 0.5

Finally, I gave them two different explanation of probability:
1. Deterministic view for probability (explained in my earlier blogpost)

After explaining the deterministic view, I emphasized the importance of probability while solving real life problem. For example, in coin tossing experiment, you are able to make some rational predictions, even though you don't know (hidden) parameters like initial angle of coin, weight and density of coin, shape of the coin, velocity of the coin, resistance due to air, characteristics of surface where it lands, etc ... Similarly, you can build simpler probabilistic model for complex scenarios like weather forecasting, stock prediction, traffic congestion, ...

Reference:
1. http://niketanblog.blogspot.com/2009/11/god-does-not-play-dice.html
2. http://en.wikipedia.org/wiki/Monty_Hall_problem
3. http://en.wikipedia.org/wiki/Gambler's_fallacy
4. http://en.wikipedia.org/wiki/Measure_(mathematics)

Tuesday, March 15, 2011

Nice post about Sachin Tendular after India-SA match

Remember when you failed an examination. How many people recall that, your class, friends, relatives? You failed to make it to the IITs or IIMs. Who remembers. How many times have you had the feeling of being the best in your class, school , university, state….., you failed to get a visa stamped this quarter…, you missed a promotion this year…, how did it feel when you dad told you in your early twenties that you are good for nothing…..and now your boss tell you the same...

You keep introspecting and go into a shell when people most of whom don’t matter a dime in your life criticize you, back bite you, make fun of you. You are left sad and shattered and you cry when your own kin scoffs at you. You say I am feeling low today. It takes a lot from us to come out of these everyday situations and move on. A lot??? really?

Now here’s a man standing on the third man boundary in the last over of a world cup match. The bowler just has to bowl sensibly to win this game. What the man at the boundary sees is 4 rank bad bowls bowled without any sense of focus, planning or regret. India loses, yet again in those circumstances when he has done just about everything right.
He does not cry. Does not show any emotion. Just keeps his head down and leaves the field. He has seen these failures for 22 years now. And not just his class, relatives, friends but the whole world has seen these failures. We are too immature to even imagine what goes on in that mind and heart of his. That’s why I would never want to be Sachin.

True, he has single handedly lifted to moods of this entire nation umpteen number of times. He has been an inspiration to rise above our mediocrity. Nobody who has ever lifted the willow even comes close to this man’s genius. His dedication and metal strength is unparallel. This is specially for those people who would have made fun of him again last night when India lost. They are people who are mediocre in their own lives. Who just scoff at others to create cheap fun. Who have lived in a small hole throughout their lives and thought they have seen the oceans.

Think about the man himself. He is 37 years of age. He has been playing almost non stop for 22 years. The way he was running and diving around the field last night would have put 22 year olds to shame. The way he played the best opening quickies in the world was breathtaking. He just keeps getting better which is by the way humanly impossible. Its not for nothing that people call him GOD.
But still I don’t want to be in those shoes. We struggle in keeping our monotonous lives straight, lives which affect a limited number of people. Imagine what would be the magnitude of the inner struggle for him, pain both mental and physical, tears that have frozen with time, knees and ankles and every other joint in the body that is either bandaged or needs to be attended to every night, eyes that don’t sleep before a big game, bats that have scored 99 international tons and still see expectations from a billion people.

And he just converts those expectations into reality. We watch in awe, feel privileged.
Well I think its time that his team realizes that enough is enough. They have an obligation, not towards their country alone but towards sachin. They need to win this one for him. Stay assured that he himself will still deliver and leave no stone unturned to make sure India wins this cup.
This is not just a game, and he is not just a sportsman. Its much more than this. Words fail here.....

-- Anonymous

(This post is not written by Harsha Bhogle even though the source page said it was)

Thursday, February 24, 2011

Setting and running Hadoop 0.20.2

Step 1: Download and extract hadoop code
wget http://mirror.cloudera.com/apache/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz
Extract the files and go into that directory

Step 2: Configure Hadoop
vim ~/.bash_profile
export JAVA_HOME=/usr/local/java/vms/java
export HADOOP_HOME=/home/oa/hadoop-asterix/hadoop-0.20.2

vim ${HADOOP_HOME}/conf/masters
asterix-master

vim ${HADOOP_HOME}/conf/slaves
asterix-001
asterix-002
asterix-003


vim ${HADOOP_HOME}/conf/core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://10.122.198.195:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>


vim ${HADOOP_HOME}/conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>10.122.198.195:54311</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>


vim ${HADOOP_HOME}/conf/hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>2</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.name.dir</name>
<value>/mnt/hdfs/name_dir/</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/mnt/hdfs/data_dir/</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>0.0.0.0:54325</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>0.0.0.0:54326</value>
</property>


vim ${HADOOP_HOME}/conf/hadoop-env.sh
export JAVA_HOME=/usr/local/java/vms/java

Make sure the name and data directory are properly setup using following script. This also confirms that you can do passwordless ssh from master to slave machines.
vim ${HADOOP_HOME}/commands.sh
cd /mnt
sudo mkdir hdfs
cd hdfs/
sudo mkdir name_dir
sudo mkdir data_dir
sudo chmod 777 -R /mnt/hdfs

# Sync this file
parallel-rsync -p 6 -r -h ${HADOOP_HOME}/conf/slaves ${HADOOP_HOME} ${HADOOP_HOME}

Then, run the above command.sh for every slave
user_name="ubuntu"
command="sh ${HADOOP_HOME}/commands.sh"
for slaves_ip in $(cat ${HADOOP_HOME}/conf/slaves)
do
ssh ${user_name}@${slaves_ip} ${command}
done


Step 3: Compile hadoop and sync slaves
# Compile source code
${ANT_HOME}/bin/ant
${ANT_HOME}/bin/ant jar
${ANT_HOME}/bin/ant examples

# Sync slaves (see EC2 point 3 for installing parallel ssh)
parallel-rsync -p 6 -r -h ${HADOOP_HOME}/conf/slaves ${HADOOP_HOME} ${HADOOP_HOME}

Step 4: Run hadoop
${HADOOP_HOME}/bin/hadoop namenode -format
${HADOOP_HOME}/bin/stop-all.sh
${HADOOP_HOME}/bin/start-all.sh

Check logs/ datanode namenode. Also see if the nodes are up, by using (any normal browser or) links http://master-ip:50070
If you get "incompatible namespace error" in datanodes log, let me try deleting hdfs dir and restarting hdfs

Step 4: Loading the HDFS
If you are loading from:
1. Local filesystem of master, use: either copyFromLocal or put
bin/hadoop dfs -copyFromLocal /mnt/wikipedia_input/wikistats/pagecounts/pagecounts* wikipedia_input

2. Some other hdfs, use put

3. Files on some other machine accessible via scp
# Configure following variables. Keep space between parenthesis of array and each item (no comma).
# If this script gives error like '4: Syntax error: "(" unexpected', try bash <script-name>
# If that gives permission denied error, put name of directories instead of ${directories1[@]}
user_name="ubuntu"
machine_name="my_machine_name_or_ip"
file_prefix="pagecounts*"
hdfs_dir="wikipedia_input"
directories1=( "/mnt/data/sdb/space/oa/wikidata/dammit.lt/wikistats/archive/2010/09" "/mnt/data/sdc/space/oa/wikidata/dammit.lt/wikistats/archive/2010/10" "/mnt/data/sdd/space/oa/wikidata/dammit.lt/wikistats/archive/2010/11" )
for dir1 in ${directories1[@]}
do
echo "--------------------------------------------"
cmd="ssh ${user_name}@${machine_name} 'ls ${dir1}/$file_prefix'"
echo "Reading files using: " $cmd
for file1 in `eval $cmd`
do
file_name1=${file1##*/}
echo -n $file_name1 " "
scp oa@asterix-001:$file1 .
bin/hadoop dfs -copyFromLocal $file_name1 $hdfs_dir
rm $file_name1
done
done

Step 5: Run your mapreduce program
${HADOOP_HOME}/bin/hadoop jar ${HADOOP_HOME}/build/hadoop-hop-0.2-examples.jar wordcount tpch_input tpch_output

Some tips for amazon EC2:
1. Lot of times due to resource allocation policies, EC2 shutdowns your virtual machine (hence the assigned network) and master/slaves/namenodes/datanodes goes into fault tolerance mode and restart the jobs. You can set infinite time for heartbeat to tackle this error (This works because EC2 restarts you virtual machine after some time and there is no "real" failure, just temporary lags) by setting following into ${HADOOP_HOME}/conf/hdfs-site.xml
<property>
<name>dfs.heartbeat.interval</name>
<value>6000</value>
<description>Determines datanode heartbeat interval in seconds.</description>
</property>
<property>
<name>dfs.heartbeat.recheck.interval</name>
<value>6000</value>
<description>Determines datanode heartbeat interval in seconds.</description>
</property>
<property>
<name>heartbeat.recheck.interval</name>
<value>6000</value>
<description>If dfs... doesnot work</description>
</property>
<property>
<name>dfs.socket.timeout</name>
<value>180000</value>
<description>dfs socket timeout</description>
</property>

2. Login into EC2 machine:
EC2_KEYPAIR_DIR="/home/np6/EC2"
echo "\nEnter Public DNS of Master"
read AMAZON_PUBLIC_DNS
echo "If this doesnot work try, (exec ssh-agent bash) and then this command again"
ssh-agent
ssh-add ${EC2_KEYPAIR_DIR}/ec2-keypair.pem
ssh ubuntu@$AMAZON_PUBLIC_DNS

3. Setting up java and other programs on slaves from master
# First install parallel ssh
user_name="ubuntu"
command="sudo apt-get install pssh"
for slaves_ip in $(cat ${HADOOP_HOME}/conf/slaves)
do
ssh ${user_name}@${slaves_ip} ${command}
done

# Then install java (if you dont prefer openjdk)
vim ${HADOOP_HOME}/commands.sh
sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
sudo apt-get update
sudo apt-get install sun-java6-jdk
sudo update-java-alternatives -s java-6-sun
echo "export JAVA_HOME=/usr/lib/jvm/java-6-sun" >> ~/.bashrc
echo "export HADOOP_HOME=/home/ubuntu/hadoop" >> ~/.bashrc
source ~/.bashrc
echo "Check if the version of java is correct:"
java -version

# Sync this file
parallel-rsync -p 6 -r -h ${HADOOP_HOME}/conf/slaves ${HADOOP_HOME} ${HADOOP_HOME}

Then, run the above command.sh for every slave
user_name="ubuntu"
command="sh ${HADOOP_HOME}/commands.sh"
for slaves_ip in $(cat ${HADOOP_HOME}/conf/slaves)
do
ssh ${user_name}@${slaves_ip} ${command}
done

4. Setting up hadoop master and slaves for lazy person (I would recommend you follow above steps instead)
cd ${HADOOP_HOME}/conf
echo "\nEnter Public DNS of Master"
read AMAZON_PUBLIC_DNS
sed -e "s/<name>mapred.job.tracker<\/name> <value>[-[:graph:]./]\{1,\}<\/value>/<name>mapred.job.tracker<\/name> <value>${AMAZON_PUBLIC_DNS}<\/value>/" hadoop-site.xml > a1.txt
sed -e "s/<name>fs.default.name<\/name> <value>[-[:graph:]./]\{1,\}<\/value>/<name>fs.default.name<\/name> <value>hdfs:\/\/${AMAZON_PUBLIC_DNS}:9001<\/value>/" a1.txt > hadoop-site.xml
echo ${AMAZON_PUBLIC_DNS} > masters
echo "\nEnter Slave string seperated with space (eg: domU-12-31-39-09-A0-84.compute-1.internal domU-12-31-39-0F-7E-61.compute-1.internal)"
read SLAVE_STR
echo $SLAVE_STR | sed -e "s/ /\n/" > slaves

Some other neat tricks:
1. Replace default java temp directory:
export JAVA_OPTS="-Djava.io.tmpdir=/mnt/java_tmp"

2. Setting number of open file limit to 99999
sudo vi /etc/security/limits.conf
ubuntu soft nofile 99999
ubuntu hard nofile 99999
* soft nofile 99999
* hard nofile 99999
sudo sysctl -p
ulimit -Hn

3. Checking the machines on the network
cat /etc/hosts

or naming machines as masters and slaves: vim /etc/hosts
10.1.0.1 asterix-master
127.0.0.1 localhost
10.0.0.1 asterix-001
10.0.0.2 asterix-002

Checking machines ip address
/sbin/ifconfig

4. Configuring password-less ssh of master to slaves
slave_user_name="ubuntu"
for slaves_ip in $(cat ${HADOOP_HOME}/conf/slaves)
do
ssh-copy-id -i $HOME/.ssh/id_rsa.pub ${slave_user_name}@${slaves_ip}
done

For more detailed step by step example, see
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/

Saturday, January 29, 2011

Life's wonderful and unexpected moments (in no particular order)

1. Life changing speech from Dad or anyone you respect.
2. Laughing so hard your face hurts.
3. Helping someone when they need you the most.
4. Falling in love.
5. Hearing your favorite song on radio while driving and singing it unmelodiously.
6. Lying in bed listening to the rain outside, sometimes even waking up to see it with hot cup of coffee.
7. Mom's hot tea, birda, chicken, ... well almost anything she makes.
8. A good conversation.
9. Watching sunset/sunrise on the beach.
10. Laughing at yourself.
11. Laughing for absolutely no reason at all.
12. Laughing at an inside joke.
13. Being sincerely happy for someone.
14. Midnight phone calls that last for hours.
15. Having someone tell you that you're handsome/sexy/intelligent.
16. Accidentally overhearing someone say something nice about you.
17. Waking up and realizing you still have a few hours left to sleep.
18. Your first kiss.
19. Making new friends or spending time with old ones.
20. Playing with a new puppy.
21. Acting like teens once in a while.
22. Having someone play with your hair.
23. Having a wonderful dream.
24. Road trips with friends.
25. Nice swedish massage.
26. Watching a really good movie cuddled up on a couch.
27. Going to a really good concert.
28. Going to a (football/cricket) game and shouting/dancing/cheering the whole time.
29. Getting butterflies in your stomach every time you see that one person.
30. Making eye contact with a cute stranger.
31. Winning a really competitive game.
32. Running into an old friend and realizing that some things (good or bad) never change.
33. A long distance phone call.
34. Taking a drive on a pretty road.
35. Feeling that you get just before you think you are going to get into trouble and especially the one after you escaped it narrowly.
36. Playing "pretend games" with kids.
37. Visiting a temple/church/mosque or any place of worship.
38. Hugging the person you love.
39. Getting a creative idea which keeps you awake all night.
40. Boating/Kayaking/Canoeing
41. Camping/Trekking in nature park/beach.
42. Having sex.
43. Laying on grass and watching sky through leaves.
44. Laying on beach and trying to count the stars.
45. Watersports: Para-sailing, snorkelling, jet-skiing, scuba-diving.
46. Swimming in the sea or lake (especially if you are not a good swimmer).
47. Browsing through books in library or bookstores especially the fields other than your research/work.
48. Reading a thought-provoking quote/paragraph.
49. Learning and appreciating a new word.
50. Learning a new language.
51. Trying something for the first time.
52. Going through the pain barrier in the gym.
53. A hot shower.
54. Trying new food/restaurants.
55. Travelling to new places.
56. Getting drunk or handling a drunk friend.
57. Donating your blood/time/food/money/organ.
58. Pursue any one topic till you attain excellence in it.
59. Doing something against rationality and totally from the guts, especially people around don't believe in you.
60. Teach a child something new.
61. Forgiving someone.
62. Buying something you always wanted for several years.
63. Dancing senselessly in a party/marriage/procession.
64. Realizing you are being loved and respected by few very important people in your life.
65. Jogging outdoors while listening to your favorite music in a nice weather.
66. Playing with colors during Holi.
67. Be a part of human pyramid during Dahi-Handi.
68. Burning fire-crackers during Diwali.
69. Sitting/Standing by the door in the train.
70. Riding a horse.
71. Learning to play a musical instruments.
72. Being thanked for a nice gesture.
73. Being heartbroken.
74. Running through sprinklers or walking in the rain.
75. Playing with the snow.
76. Feeling of calm in solitude or while meditating.
77. Blood-rush to the brain while doing yoga.
78. Dressing funny for a costume party.
79. Feeling of levitation while jumping or catching a frishbee.
80. Take-off and landing of an airplane.
81. Sitting in the cockpit of Boeing 777.
82. Buying your first car/house.
83. Sailing on a yatch.
84. Being married to an amazing person.
85. Attending a major sporting event: the World Cup (Cricket/Soccer), Super Bowl, the Olympics, the U.S. Open.
86. Throwing a huge party and inviting every one of your friends.
87. Going to Disneyland/Seaworld and other them parks.
88. Skydiving/Bungee jumping.
89. Having your portrait/caricature painted.
90. Watching the launch of the space shuttle.
91. Spending a whole day eating junk food without feeling guilty.
92. Performing in front of large crowd.
93. Telling someone the story of your life, sparing no details.
94. Rollerblading/Skating/Ice-skating/Paintball/shooting-range/go-karting
95. Fishing in the sea.
96. Being someone's mentor.
97. Shower in a waterfall.
98. Painting the walls of your own house.
99. Being proud of someone else's achievement.
100. Getting a lapdance from the cutest girl you have seen.
101. Having an uncontrollable giggling fit at the worst possible moment.