## Tuesday, February 17, 2015

### Plotting as a useful debugging tool

While writing the code for bayesian modeling, you will have to test the distribution (prior, likelihood or posterior). Here are two common scenarios that you might encounter:
1. You want to test the function that generates random deviates. For example: you have derived a conjugate formula for a parameter of your model and want to test whether it is correct or not.
2. You want to test a probability density function. For example: likelihood or posterior function that you might want to run rejection sampler on.
In both these cases, you will start by making sure the property of the distribution (for example: range, mean, variance) are correct. For example: if the parameter you are sampling is variance, then you will have “assert(returnedVariance > 0)” in your code. Then, the next obvious test should be visual inspection (trust me, it has helped me catch more bugs than I would by traditional programming debugging techniques/tools). This means you will plot the distribution and see if the output of your code makes sense.
We will start by simplifying the above two cases by assuming standard normal distribution. So, in the first case, we have access to “rnorm” function and in second case, we have access to “dnorm” function of R.
Case 1: In this case, we first collect random deviates (in “vals”) and then use ggplot to plot them:
```library(ggplot2)
vals = rnorm(10000, mean=0, sd=1)
df = data.frame(xVals=vals)
ggplot(df, aes(x=xVals)) + geom_density()
```

The output of above R script will look something like this:
Case 2: In this case, we have to assume a bounding box and sample inside that to get “x_vals” and “y_vals” (just like rejection sampling):
```library(ggplot2)
x_vals=runif(10000,min=-4,max=4)
y_vals=dnorm(x_vals, mean=0, sd=1)
df = data.frame(xVals=x_vals, yVals=y_vals)
ggplot(df, aes(x=xVals, y=yVals)) + geom_line()
```

The output of above R script will look something like this:
Just as a teaser to a post that I will post later, we can use the script somewhat similar to that of case 1 to study the characteristics of a distribution:

### How to setup “passwordless ssh” on Amazon EC2 cluster

Often for running distributed applications, you may want to setup a new cluster or tweak an existing one (running on Amazon EC2) to support passwordless ssh. For creating a new cluster from scratch, there are lot of cluster management tools (which is beyond the scope of this blogpost). However, if all you want to do is setup “passwordless ssh” between nodes, then this post might be worth your read.
The script below assumes that you have completed following three steps:

Step 1. Created RSA public keypair on each of the machine:
```cd ~
ssh-keygen -t rsa```

Do not enter any paraphrase, instead just press [enter].

Step 2. Suppressed warning flags in ssh-config file:
```sudo vim /etc/ssh/ssh_config
StrictHostKeyChecking no
UserKnownHostsFile=/dev/null```
` `
Step 3. Copied the key pair file “MyKeyPair.pem” to master’s home directory:
`scp -i /local-path/MyKeyPair.pem /local-path/MyKeyPair.pem ubuntu@ec2-master-public-address.compute-1.amazonaws.com:~`

Assuming that above three steps have been completed, run this script on the master to enable passwordless ssh between master-slave and/or slave-slave nodes:
```

#!/bin/bash
# Author: Niketan R. Pansare

# Make sure you have transferred your key-pair to master
if [ ! -f ~/.ssh/id_rsa.pub ]; then
echo "Expects ~/.ssh/id_rsa.pub to be created. Run ssh-keygen -t rsa from home directory"
exit
fi

if [ ! -f ~/MyKeyPair.pem ]; then
echo "For enabling password-less ssh, transfer MyKeyPair.pem to master's home folder (e.g.: scp -i /local-path/MyKeyPair.pem /local-path/MyKeyPair.pem ubuntu@master_public_dns:~)"
exit
fi

echo -n -e "\${green}Public\${endColor} dns address of master:"
echo ""

# Assumption here is that you want to create a small cluster ~ 10 nodes
echo -n -e "\${green}Public\${endColor} dns addresses of slaves (separated by space):"
echo ""

echo -n -e "Do you want to enable password-less ssh between \${green}master-slaves\${endColor} (y/n):"
echo ""
if [ "\$ENABLE_PASSWORDLESS_SSH" == "y" ]; then
# Copy master's public key to itself
#cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

for SLAVE_IP in \$SLAVE_IPS
do
echo "Checking passwordless ssh between master -> "\$SLAVE_IP
if [ \$? -eq 0 ]; then
echo "Passwordless ssh has been setup between master -> "\$SLAVE_IP
echo "Now checking passwordless ssh between "\$SLAVE_IP" -> master"

ssh \$SLAVE_IP 'ssh -o PasswordAuthentication=no' \$MASTER_IP '/bin/true'
if [ \$? -eq 0 ]; then
echo "Passwordless ssh has been setup between "\$SLAVE_IP" -> master"
fi
fi

if [ "\$IS_PASSWORD_LESS" == "n" ]; then
# ssh-copy-id gave me lot of issues, so will use below commands instead
echo "Enabling passwordless ssh between master and "\$SLAVE_IP

# Copy master's public key to slave
cat ~/.ssh/id_rsa.pub | ssh -i ~/MyKeyPair.pem "ubuntu@"\$SLAVE_IP 'mkdir -p ~/.ssh ; cat >> ~/.ssh/authorized_keys'
# Copy slave's public key to master
ssh -i ~/MyKeyPair.pem "ubuntu@"\$SLAVE_IP 'cat ~/.ssh/id_rsa.pub' >> ~/.ssh/authorized_keys
# Copy slave's public key to itself
ssh -i ~/MyKeyPair.pem "ubuntu@"\$SLAVE_IP 'cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys'

fi
done

echo ""
echo "---------------------------------------------"
echo "Testing password-less ssh on master -> slave"
for SLAVE_IP in \$SLAVE_IPS
do
ssh "ubuntu@"\$SLAVE_IP  uname -a
done

echo ""
echo "Testing password-less ssh on slave -> master"
for SLAVE_IP in \$SLAVE_IPS
do
ssh "ubuntu@"\$SLAVE_IP 'ssh ' \$MASTER_IP 'uname -a'
done
echo "---------------------------------------------"
echo "Sorry, prefer to keep this check manual to avoid headache in Hadoop or any other distributed program."
echo -n -e "Do you see error or something fishy in above block (y/n):"
echo ""
if [ "\$IS_ERROR1" == "y" ]; then
echo "I am sorry to hear this script didn't work for you :("
echo "Hint1: Its quite possible, slave doesnot contain ~/MyKeyPair.pem"
echo "Hint2: sudo vim /etc/ssh/ssh_config and add StrictHostKeyChecking no and UserKnownHostsFile=/dev/null to it"
exit
fi
fi

echo -n -e "Do you want to enable password-less ssh between \${green}slave-slave\${endColor} (y/n):"
echo ""
if [ "\$ENABLE_PASSWORDLESS_SSH1" == "y" ]; then
if [ "\$ENABLE_PASSWORDLESS_SSH" == "n" ]; then
echo -n -e "In this part, the key assumption is that password-less ssh between \${green}master-slave\${endColor} is enabled. Do you still want to continue (y/n):"
if [ "\$ANS1" == "n" ]; then
exit
fi
echo ""

fi
for SLAVE_IP1 in \$SLAVE_IPS
do
for SLAVE_IP2 in \$SLAVE_IPS
do
if [ "\$SLAVE_IP1" != "\$SLAVE_IP2" ]; then
# Checking assumes passwordless ssh has already been setup between master and slaves
echo "[Warning:] Skipping checking passwordless ssh between "\$SLAVE_IP1" -> "\$SLAVE_IP2

# This will be true because ssh \$SLAVE_IP1 is true
#ssh \$SLAVE_IP1 ssh -o PasswordAuthentication=no \$SLAVE_IP2 /bin/true
#if [ \$? -eq 0 ]; then

if [ "\$IS_PASSWORDLESS_SSH_BETWEEN_SLAVE_SET" == "n" ]; then
echo "Enabling passwordless ssh between "\$SLAVE_IP1" and "\$SLAVE_IP2
# Note you are on master now, which we assume to have
ssh -i ~/MyKeyPair.pem \$SLAVE_IP1 'cat ~/.ssh/id_rsa.pub' | ssh -i ~/MyKeyPair.pem \$SLAVE_IP2 'cat >> ~/.ssh/authorized_keys'
else
echo "Passwordless ssh has been setup between "\$SLAVE_IP1" -> "\$SLAVE_IP2
fi
fi
done

done

echo "---------------------------------------------"
echo "Testing password-less ssh on slave  slave"
for SLAVE_IP1 in \$SLAVE_IPS
do
for SLAVE_IP2 in \$SLAVE_IPS
do
# Also, test password-less ssh on the current slave machine
ssh \$SLAVE_IP1 'ssh ' \$SLAVE_IP2 'uname -a'
done
done
echo "---------------------------------------------"
echo "Sorry, prefer to keep this check manual to avoid headache in Hadoop or any other distributed program."
echo -n -e "Do you see error or something fishy in above block (y/n):"
echo ""
if [ "\$IS_ERROR1" == "y" ]; then
echo "I am sorry to hear this script didn't work for you :("
echo "Hint1: Its quite possible, slave doesnot contain ~/MyKeyPair.pem"
echo "Hint2: sudo vim /etc/ssh/ssh_config and add StrictHostKeyChecking no and UserKnownHostsFile=/dev/null to it"
exit
fi
fi

```

Here is a sample output obtained by running the above script (src code available via link):
```ubuntu@ip-XXX:~\$ ./enablePasswordlessSSH.sh

Public dns addresses of slaves (separated by space):ec2-XXX-191.compute-1.amazonaws.com ec2-XXX-240.compute-1.amazonaws.com ec2-XXX-215.compute-1.amazonaws.com ec2-XXX-192.compute-1.amazonaws.com ec2-XXX-197.compute-1.amazonaws.com

Do you want to enable password-less ssh between master-slaves (y/n):y

Checking passwordless ssh between master -> ec2-XXX-YYY.compute-1.amazonaws.com
bunch of warning and possibly few "Permission denied (publickey)." (Ignore this !!!)

---------------------------------------------
Testing password-less ssh on master -> slave
Don't ignore any error here !!!
---------------------------------------------
Sorry, prefer to keep this check manual to avoid headache in Hadoop or any other distributed program.
Do you see error or something fishy in above block (y/n):n

Do you want to enable password-less ssh between slave-slave (y/n):y

---------------------------------------------
Testing password-less ssh on slave <-> slave
Don't ignore any error here !!!
---------------------------------------------

Sorry, prefer to keep this check manual to avoid headache in Hadoop or any other distributed program.
Do you see error or something fishy in above block (y/n):n

```

Warning: The above code does not check for passwordless ssh for slave-slave configuration and sets it blindly even if passwordless ssh is already enabled. It might not be big deal if you call this script few times but won’t be ideal if the cluster needs to dynamically modified over and over again. Still, to modify this behavior, look at the line: IS_PASSWORDLESS_SSH_BETWEEN_SLAVE_SET=”n”