Monday, October 16, 2017

Git notes for SystemML committers

Replace niketanpansare below with your git username:

1. Fork and then Clone the repository

2. Update the configuration file: systemml/.git/config
symlinks = false
repositoryformatversion = 0
filemode = false
logallrefupdates = true
[remote "origin"]
url =
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "master"]
remote = origin
merge = refs/heads/master
[remote "upstream"]
url =
fetch = +refs/heads/*:refs/remotes/upstream/*
[remote "apache"]
url =
fetch = +refs/heads/*:refs/remotes/apache/*

The above configuration states that: "origin" points to your fork and "upstream" points to the github mirror of the main repository, which is referred to as "apache".

3. Now, let's you created a branch 'foo' with 3 commits. You can create a PR via github website

4. Once the PR is approved and if you are the committer, you can merge the PR:
git checkout foo
git reset --soft HEAD~3 && git commit
git pull --rebase apache master
git checkout master
git pull apache master
git merge foo
git log apache/master..
git commit --amend --date="$(date +%s)"
git push apache master
Let's discuss the above commands in more detail:
- git checkout foo: ensures that you are working on the right branch.
- git reset --soft HEAD~3 && git commit: squashes all your commits into one single commit in preparation for the merge.
- git pull --rebase apache master: rebases your local master branch with the apache.
- git checkout master: switch to the local master branch
- git merge foo: merge the branch foo with local master
- gitk: to double-check visually if local and remote branches are all in synch and if the commit is not creating weird loops.
- git log apache/master..: double-check if the log makes sense and if you have added "Closes #PR." at the end of the log and correct JIRA number as the header.
- git commit --amend --date="$(date +%s)": if not, allows you to modify your log message.
- git push apache master: go ahead and push your commit to apache :)

5. If you have modified the docs/ folder, you can update the documentation:
git subtree push --prefix docs apache gh-pages

Before, updating the documentation, you can sanity test on your branch by:
git subtree push --prefix docs origin gh-pages

If the above command fails, you can delete the gh-pages branch and re-execute the command:
git push origin --delete gh-pages

Advanced commands:
1. Cherry-picking:
git log --pretty=format:"%h - %an, %ar : %s" | grep Niketan
git checkout gpu
git cherry-pick

Notes from other committers:

Friday, November 13, 2015

Using Google's TensorFlow for Kaggle competition

Recently, Google Brain team released their neural network library 'TensorFlow'. Since Google has a state-of-the-art Deep Learning system, I wanted to explore TensorFlow by trying it out for my first Kaggle submission (Digit Recognition) . After spending few hours getting to know their jargons/APIs, I modified their multilayer convolution neural network and came 140th (with 98.4% accuracy) ... not too shabby for my first submission :D.

If you are interested in outscoring me, apply cross-validation on the below python code or may be consider using ensembles:
from input_data import *
import pandas

class DataSets(object):

mnist = DataSets()
df = pandas.read_csv('train.csv')

train_images = numpy.multiply(df.drop('label', 1).values, 1.0 / 255.0)
train_labels = dense_to_one_hot(df['label'].values)

#Add MNIST data from Yan LeCun's website for better accuracy. We hold out test, just for accuracy sake, but could have easily added it :)

mnist2 = read_data_sets("/tmp/data/", one_hot=True)
train_images = numpy.concatenate((train_images, mnist2.train._images), axis=0)
train_labels = numpy.concatenate((train_labels, mnist2.train._labels), axis=0)

validation_images = train_images[:VALIDATION_SIZE]
validation_labels = train_labels[:VALIDATION_SIZE]
train_images = train_images[VALIDATION_SIZE:]
train_labels = train_labels[VALIDATION_SIZE:]

mnist.train = DataSet([], [], fake_data=True)
mnist.train._images = train_images
mnist.train._labels = train_labels

mnist.validation = DataSet([], [], fake_data=True)
mnist.validation._images = validation_images
mnist.validation._labels = validation_labels

df1 = pandas.read_csv('test.csv')
test_images = numpy.multiply(df1.values, 1.0 / 255.0)
numTest = df1.shape[0]
test_labels = dense_to_one_hot(numpy.repeat([1], numTest))
mnist.test = DataSet([], [], fake_data=True)
mnist.test._images = test_images
mnist.test._labels = test_labels

import tensorflow as tf
sess = tf.InteractiveSession()
x = tf.placeholder("float", [None, 784])   # x is input features
W = tf.Variable(tf.zeros([784,10]))           # weights 
b = tf.Variable(tf.zeros([10]))                    # bias
y_ = tf.placeholder("float", [None,10])   # y' is input labels
#Weight Initialization
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)
def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)
# Convolution and Pooling
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
# First Convolutional Layer
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,28,28,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

# Second Convolutional Layer
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

# Densely Connected Layer
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# Dropout
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# Readout Layer
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
cross_entropy = -tf.reduce_sum(y_*tf.log(y))

train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
for i in range(20000):
 batch = mnist.train.next_batch(50){x: batch[0], y_: batch[1], keep_prob: 0.5})

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print "test accuracy %g"%accuracy.eval(feed_dict={x: mnist2.test.images, y_: mnist2.test.labels, keep_prob: 1.0})
prediction = tf.argmax(y,1).eval(feed_dict={x: mnist.test.images, keep_prob: 1.0})

s1='\n'.join(str(x) for x in prediction)
The above script assumes that you have downloaded train.csv and test.csv from Kaggle's website, from TensorFlow's website and installed pandas/TensorFlow.

Here are few observations based on my experience playing with TensorFlow:
  1. TensorFlow does not have an optimizer
    1. TensorFlow statically maps an high-level expression (for example "matmul") to a predefined low-level operator (for example: matmul_op.h) based on whether you are using CPU or GPU enabled TensorFlow. On other hand, SystemML compiles a matrix multiplication expression (X %*% y) into one of many matrix-multiplication related physical operators using a sophisticated optimizer that adapts to the underlying data and cluster characteristics.
    2. Other popular open-source neural network libraries are CaffeTheano and Torch
  2. TensorFlow is a parallel, but not a distributed system:
    1. It does parallelize its computation (across CPU cores and also across GPUs):  
    2. Google has released only the single-node version and kept distributed version in-house. This means that the open-sourced version does not have a parameter server. 
  3. TensorFlow is easy to use, but difficult to debug:
    1. I like TensorFlow's Python API and if the script/data/parameters are all correct, it works absolutely fine :)
    2. But, if something fails, the error messages thrown by TensorFlow are difficult to decipher. This is because the error messages point to a generated physical operator (for example: tensorflow.python.framework.errors.InvalidArgumentError: ReluGrad input), not to the line of code in the Python program.
  4. TensorFlow is slow to train and not yet robust enough:
    1. Here are some initial numbers by Alex Smola comparing TensorFlow to other open-source deep learning systems:

Tuesday, February 17, 2015

Plotting as a useful debugging tool

While writing the code for bayesian modeling, you will have to test the distribution (prior, likelihood or posterior). Here are two common scenarios that you might encounter:
  1. You want to test the function that generates random deviates. For example: you have derived a conjugate formula for a parameter of your model and want to test whether it is correct or not.
  2. You want to test a probability density function. For example: likelihood or posterior function that you might want to run rejection sampler on.
In both these cases, you will start by making sure the property of the distribution (for example: range, mean, variance) are correct. For example: if the parameter you are sampling is variance, then you will have “assert(returnedVariance > 0)” in your code. Then, the next obvious test should be visual inspection (trust me, it has helped me catch more bugs than I would by traditional programming debugging techniques/tools). This means you will plot the distribution and see if the output of your code makes sense.
We will start by simplifying the above two cases by assuming standard normal distribution. So, in the first case, we have access to “rnorm” function and in second case, we have access to “dnorm” function of R.
Case 1: In this case, we first collect random deviates (in “vals”) and then use ggplot to plot them:
vals = rnorm(10000, mean=0, sd=1)
df = data.frame(xVals=vals)
ggplot(df, aes(x=xVals)) + geom_density()

The output of above R script will look something like this:
Case 2: In this case, we have to assume a bounding box and sample inside that to get “x_vals” and “y_vals” (just like rejection sampling):
y_vals=dnorm(x_vals, mean=0, sd=1)
df = data.frame(xVals=x_vals, yVals=y_vals)
ggplot(df, aes(x=xVals, y=yVals)) + geom_line()

The output of above R script will look something like this:
Just as a teaser to a post that I will post later, we can use the script somewhat similar to that of case 1 to study the characteristics of a distribution:

How to setup “passwordless ssh” on Amazon EC2 cluster

Often for running distributed applications, you may want to setup a new cluster or tweak an existing one (running on Amazon EC2) to support passwordless ssh. For creating a new cluster from scratch, there are lot of cluster management tools (which is beyond the scope of this blogpost). However, if all you want to do is setup “passwordless ssh” between nodes, then this post might be worth your read.
The script below assumes that you have completed following three steps:

Step 1. Created RSA public keypair on each of the machine:
cd ~
ssh-keygen -t rsa

Do not enter any paraphrase, instead just press [enter].

Step 2. Suppressed warning flags in ssh-config file:
sudo vim /etc/ssh/ssh_config
StrictHostKeyChecking no
Step 3. Copied the key pair file “MyKeyPair.pem” to master’s home directory:
scp -i /local-path/MyKeyPair.pem /local-path/MyKeyPair.pem

Assuming that above three steps have been completed, run this script on the master to enable passwordless ssh between master-slave and/or slave-slave nodes:

    # Author: Niketan R. Pansare
    # Password-less ssh
    # Make sure you have transferred your key-pair to master
    if [ ! -f ~/.ssh/ ]; then
      echo "Expects ~/.ssh/ to be created. Run ssh-keygen -t rsa from home directory"
    if [ ! -f ~/MyKeyPair.pem ]; then
      echo "For enabling password-less ssh, transfer MyKeyPair.pem to master's home folder (e.g.: scp -i /local-path/MyKeyPair.pem /local-path/MyKeyPair.pem ubuntu@master_public_dns:~)"
    echo "Provide following ip-addresses"
    echo -n -e "${green}Public${endColor} dns address of master:"
    read MASTER_IP
    echo ""
    # Assumption here is that you want to create a small cluster ~ 10 nodes
    echo -n -e "${green}Public${endColor} dns addresses of slaves (separated by space):"
    read SLAVE_IPS
    echo "" 
    echo -n -e "Do you want to enable password-less ssh between ${green}master-slaves${endColor} (y/n):"
    echo ""
    if [ "$ENABLE_PASSWORDLESS_SSH" == "y" ]; then
      # Copy master's public key to itself
      #cat ~/.ssh/ >> ~/.ssh/authorized_keys
      for SLAVE_IP in $SLAVE_IPS
          echo "Checking passwordless ssh between master -> "$SLAVE_IP
        ssh -o PasswordAuthentication=no $SLAVE_IP /bin/true
        if [ $? -eq 0 ]; then
            echo "Passwordless ssh has been setup between master -> "$SLAVE_IP
            echo "Now checking passwordless ssh between "$SLAVE_IP" -> master"
          ssh $SLAVE_IP 'ssh -o PasswordAuthentication=no' $MASTER_IP '/bin/true'  
          if [ $? -eq 0 ]; then
              echo "Passwordless ssh has been setup between "$SLAVE_IP" -> master"
        if [ "$IS_PASSWORD_LESS" == "n" ]; then
          # ssh-copy-id gave me lot of issues, so will use below commands instead
          echo "Enabling passwordless ssh between master and "$SLAVE_IP
          # Copy master's public key to slave
          cat ~/.ssh/ | ssh -i ~/MyKeyPair.pem "ubuntu@"$SLAVE_IP 'mkdir -p ~/.ssh ; cat >> ~/.ssh/authorized_keys' 
          # Copy slave's public key to master
          ssh -i ~/MyKeyPair.pem "ubuntu@"$SLAVE_IP 'cat ~/.ssh/' >> ~/.ssh/authorized_keys
          # Copy slave's public key to itself
          ssh -i ~/MyKeyPair.pem "ubuntu@"$SLAVE_IP 'cat ~/.ssh/ >> ~/.ssh/authorized_keys'
      echo ""
      echo "---------------------------------------------"
      echo "Testing password-less ssh on master -> slave"
      for SLAVE_IP in $SLAVE_IPS
        ssh "ubuntu@"$SLAVE_IP  uname -a
      echo ""
      echo "Testing password-less ssh on slave -> master"
      for SLAVE_IP in $SLAVE_IPS
        ssh "ubuntu@"$SLAVE_IP 'ssh ' $MASTER_IP 'uname -a'
      echo "---------------------------------------------"
      echo "Sorry, prefer to keep this check manual to avoid headache in Hadoop or any other distributed program."
      echo -n -e "Do you see error or something fishy in above block (y/n):"
      read IS_ERROR1
      echo ""
      if [ "$IS_ERROR1" == "y" ]; then
        echo "I am sorry to hear this script didn't work for you :("
        echo "Hint1: Its quite possible, slave doesnot contain ~/MyKeyPair.pem"
        echo "Hint2: sudo vim /etc/ssh/ssh_config and add StrictHostKeyChecking no and UserKnownHostsFile=/dev/null to it"
    echo -n -e "Do you want to enable password-less ssh between ${green}slave-slave${endColor} (y/n):"
    echo ""
    if [ "$ENABLE_PASSWORDLESS_SSH1" == "y" ]; then
      if [ "$ENABLE_PASSWORDLESS_SSH" == "n" ]; then
        echo -n -e "In this part, the key assumption is that password-less ssh between ${green}master-slave${endColor} is enabled. Do you still want to continue (y/n):"
        read ANS1
        if [ "$ANS1" == "n" ]; then 
        echo ""
      for SLAVE_IP1 in $SLAVE_IPS
        for SLAVE_IP2 in $SLAVE_IPS
          if [ "$SLAVE_IP1" != "$SLAVE_IP2" ]; then
            # Checking assumes passwordless ssh has already been setup between master and slaves
              echo "[Warning:] Skipping checking passwordless ssh between "$SLAVE_IP1" -> "$SLAVE_IP2
            # This will be true because ssh $SLAVE_IP1 is true
            #ssh $SLAVE_IP1 ssh -o PasswordAuthentication=no $SLAVE_IP2 /bin/true
            #if [ $? -eq 0 ]; then
            if [ "$IS_PASSWORDLESS_SSH_BETWEEN_SLAVE_SET" == "n" ]; then
              echo "Enabling passwordless ssh between "$SLAVE_IP1" and "$SLAVE_IP2
              # Note you are on master now, which we assume to have 
              ssh -i ~/MyKeyPair.pem $SLAVE_IP1 'cat ~/.ssh/' | ssh -i ~/MyKeyPair.pem $SLAVE_IP2 'cat >> ~/.ssh/authorized_keys' 
              echo "Passwordless ssh has been setup between "$SLAVE_IP1" -> "$SLAVE_IP2
      echo "---------------------------------------------"
      echo "Testing password-less ssh on slave  slave"
      for SLAVE_IP1 in $SLAVE_IPS
        for SLAVE_IP2 in $SLAVE_IPS
          # Also, test password-less ssh on the current slave machine
          ssh $SLAVE_IP1 'ssh ' $SLAVE_IP2 'uname -a'
      echo "---------------------------------------------"
      echo "Sorry, prefer to keep this check manual to avoid headache in Hadoop or any other distributed program."
      echo -n -e "Do you see error or something fishy in above block (y/n):"
      read IS_ERROR1
      echo ""
      if [ "$IS_ERROR1" == "y" ]; then
        echo "I am sorry to hear this script didn't work for you :("
        echo "Hint1: Its quite possible, slave doesnot contain ~/MyKeyPair.pem"
        echo "Hint2: sudo vim /etc/ssh/ssh_config and add StrictHostKeyChecking no and UserKnownHostsFile=/dev/null to it"

Here is a sample output obtained by running the above script (src code available via link):
ubuntu@ip-XXX:~$ ./
Provide following ip-addresses

Public dns address of

Public dns addresses of slaves (separated by space)

Do you want to enable password-less ssh between master-slaves (y/n):y

Checking passwordless ssh between master ->
bunch of warning and possibly few "Permission denied (publickey)." (Ignore this !!!)

Testing password-less ssh on master -> slave
Don't ignore any error here !!!
Sorry, prefer to keep this check manual to avoid headache in Hadoop or any other distributed program.
Do you see error or something fishy in above block (y/n):n

Do you want to enable password-less ssh between slave-slave (y/n):y

Testing password-less ssh on slave <-> slave
Don't ignore any error here !!!

Sorry, prefer to keep this check manual to avoid headache in Hadoop or any other distributed program.
Do you see error or something fishy in above block (y/n):n

Warning: The above code does not check for passwordless ssh for slave-slave configuration and sets it blindly even if passwordless ssh is already enabled. It might not be big deal if you call this script few times but won’t be ideal if the cluster needs to dynamically modified over and over again. Still, to modify this behavior, look at the line: IS_PASSWORDLESS_SSH_BETWEEN_SLAVE_SET=”n”

Saturday, March 08, 2014

Notes on entrepreneurship (Part 3)

Few updates since the last time I wrote the blog:
1. At the end of the entrepreneurship course, the website and an iOS app was implemented and I got an A grade ... yaay !!!

2. Since I was out-of-country during the final presentation to judges (VCs, Startup founders, etc), I created a youtube video to motivate this project and my teammate explained the remaining logistics of our project [10].

3. Vigilant team won 3rd place in HackRice 2014 competition for developing Android version along with SVM classifier [3] to predict threat level based on location and time of the day. We used crime data from Houston PD for 5 years, so yes this feature only worked for Houston :(

4. I was lucky enough to get selected to participate in Ignite Conference 2014 [4]. To get the ball rolling on the first day, we were asked to build as tall structure as possible in given time frame with marshmallow on the top using noodle sticks, tape and thread [5]. The next day, we got to visit Khosla Ventures and Square (and few other startups). On last two days, we had talks from several guest entrepreneurs where they spoke about their entrepreneurship journey. Here are few of the lessons I learnt (directly/indirectly from this conference):
4.a What ideas/projects to work on ?
- One of the most common advice you will receive is "fail fast, fail often and fail cheap". However, I suggest not to take it verbatim and understand that the emphasis is not on failing but learning. So, you should not go with the mindset that if I am anyways planning to fail early, let me start with a dumb idea. In fact, the mindset should be "test early/cheaply; if you do fail, don't get hung over it, instead learn quickly why did you fail; and then adapt". Let me put this in much broader context in terms of one of the most important lesson I have learnt in research and entrepreneurship: though most people will judge you by your bank balance and your citations, you (as a researcher or entrepreneur) should try to evaluate yourself by your ability to keep failures small and useful.
- Don't hesitate or hold yourself back if you are passionate about an idea that appears to be "black swan". But makes sure you "know what you know and know what you don't know". Then, evaluate whether things you don't know will halt the project, if so, find an expert who can help you with that.
- Before you think of quitting your job and start a venture, make a list of non-glamorous/routine stuff about startup you will have to do. Though many of these might be solvable by capital, if you do intend to go the lean startup way, ask yourself: Are you mentally flexible to get over your ego and do these task if need be ?
4.b Building a team and network:
- Build team that is smarter than you. And make sure you give back and focus on their growth too, else you will soon face with talent retention problem.
- Here are some of the key points to build and keep great teams: Minimize unnecessary bureaucracy [14], exercise merit-based promotion strategy, treat others with respect and most importantly, with respect to improvement of your startup, encourage everyone to pitch-in their ideas rather than giving everyone list of to-do items [15].
- Hire a really good sales team you can find. This is especially true in internet technology though it has scalability challenges. Why? People would much rather prefer to deal with a sales person rather than internet widget when they are paying huge money for the given service. This also means don't wait until your product testing phase to hire a sales team. Rather start with integrated sales-development team and use sales part of the team as to validate/refine the hypothesis by communicating with customers during the product development phase. Best case scenario, the founder is that person ... that first assumes the role of salesman, talks to clients/customers, figures out requirements, then by himself or by hiring really good team develops the product, markets it, and then keeps on iterating the whole process again [8].
- Networking is extremely critical. It usually takes time, else it seems inauthentic/insincere. So, don't use the number of LinkedIn connections/visiting cards in your wallet as measure of networking, instead count number of people who genuinely want to help you and vice versa. Other way to put it: networking is not about how many people you are acquainted to but rather how many mentors, benefactors and friends you have.
4.c Managing yourself:
- It is extremely important to find ways to manage stress and take care of your health. Also, if your better-half and family understands the challenges of startups and supports/love you, the stress/difficulties reduces exponentially.
- Especially in software development phase, use productivity rather than hours worked as measure.
- Know the difference between persistence and stubbornness. Remember persistence (without stubbornness) requires learning and pivoting given new information/lessons, rather than doing same things over and over again expecting different results [6].
- When working in team, your idea is only good to the degree to which you can explain it. If your team doesn't understand it, you can almost be sure that your customer won't be able to understand it. This skill will probably require an entirely new blogpost (which I might write later), but for now here are three things you can do:
a. Read books about presentation/explanation/UI design: Design of everyday things, Don't make me thinkBack of NapkinArt of explanation, Whiteboard selling, slide:ology and 100 things every designer needs to know.
b. Learn empathy (i.e. relating to team/developers/sales and especially customers).
c. Be specific. If you don't break up tasks into smaller items and clarify the purpose/big picture, people (either your team or investor) might feel overwhelmed and can sometime get defensive. One particular thing that every member on your team needs to be very crystal clear about is "minimum viable product".

Let me recap the high-level steps [16] described until now in this blog series:

Step 0: Define your means: "what I know",  "what do I have" and "whom do I know".

Step 1: Ideation: See part 1 of this series for high-level tips.
Step 1.1: Being little more specific about step 1: Create/Refine 60 second elevator pitch: The key idea is if you meet a key investor/evangelist/strategic partner in an elevator and if he asks you what are you working on, you should be able to summarize your business idea before the elevator reaches certain floor and that person has to leave. Here are few suggestions about creating your elevator pitch:
- Sit down and write your elevator pitch without too much thinking. First iteration will always be awful.
- Iterate, rehearse, iterate again and then rehearse again (and keep doing this) until your elevator pitch becomes second nature to you.
- Talk to your customers and experts in that field and try to incorporate their feedback.
- Be honest and don't promise "world peace". Remember, your words represents your thoughts/character and the investor is investing in you as much as he/she is investing in the idea.
- There is no one way to do this, so learn-by-example; aka do youtube/google search for "elevator pitch" and listen to them [7].
- Use analogy to simplify the problem.
- If possible, explain how the problem affects you, your family or someone you know.
- The first and last sentence are extremely important and connecting them helps to create a more effective delivery.
Here is one of the iteration of my elevator pitch:
Studies show that in the US alone, one out of four women get sexually assaulted in their life-time. This number is much higher when they travel to developing countries like India. Though this problem is highly complex, we believe that a fraction of it can be solved by technology-based deterrents that act as first line of defense when in danger. We present one such solution: Vigilant. Vigilant is an affordable and hassle-free personal safety solution that connects you to friends, family and authorities during emergency with just a click of a button. The goal of Vigilant is to make everyones life safer with the help of smarter software and hardware.
Step 2: Opportunity evaluation:
Step 2.1: Use high-level checklist described in part 1 of this series.
Step 2.2: Create a business model canvas: Before you read anything further, you should definitely watch this 2 min video describing the business model canvas. Also, read the book Business Model Generation to understand why and how to create a business model canvas. If you don't have time to read through the book, here is few websites that allows you to create one by step-by-step instructions: zoomstralaunchpad central, turbostart, wordpress plugin, collaboration website. You can also go through following lectures/tutorials to learn how to create one: Steve Blank's udacity class,  Alex's blog.
Step 2.3: Early customer validation/requirement gathering using one or more of the feedback/analytics tools (some are applicable only at later stages): These tools help you quantify/find out: what people think they want, how much they think they want to pay and how much they will really pay.
- False buy page or dummy (but fully functional) e-commerce page for buying clickers [1].
- Contact form plugin in wordpress.
- Feedback button in app.
- Site Traffic widget/plugin in wordpress.
Facebook likes.
Google forms for suggestions from early adopters: Here are few suggestions for coming up with good questions (ones that I didn't know earlier): Kevin's blog, Stanford's videos for customer discovery, Learn startup customer development templates, Alex's blog, Steve's suggestions, Kaushik's suggestions.
- Other tools that you can use for market research are Google's customer survey tool, Amazon's mechanical turk, Qualtrics.
- Pamphlet with google url shortener[2] and QR codes across campus.
- Other tools that allow to do customer validation: validatelyfoundersuite.

Step 3: Build a great team !!!
- See suggestions given above in point 4.b and 4.c.
- There are few websites that help you find your founders: FoundersDating, CoFoundersLab or attend a meetup or hackthons or startup weekend.

Step 4 (or 5): Implementation, marketing, validation, refining business model canvas and keep on iterating.
This step requires above mentioned feedback/analytics tool as well as marketing/branding tools:
- Checkout fiverr for really cool and cheap marketing ideas.
- Make sure you install SEO plugin for your wordpress. I am using this plugin, but there are equally good plugins in the market.
- Create a marketing message/keywords based on above feedback tools or google trends or google keyword tool. Then keep refining it with respect to location based on google insights. Similarly, there are several analytics tools that will help you in marketing/branding like Google AnalyticsAmazon AnalyticsHeap AnalyticsAppAnnieDistimo, etc.
- Advertise on GoogleFacebookBing or LinkedIn. If you prefer traditional marketing tools, you can look at srds.
- Read Art of explanation and make a user-friendly video about your product. You can use one of the following tools to create the video: Sparkol, Explainify, LooseKeys, Amazon's Elastic Transcoder, Camtasia.
For exhaustive list of tools, refer to Steve Blank's list, YCombinator's list, Startup Weekend's list, PBWork's list, HBS list.

Step 5 (or 4): Funding your startup:
Step 5.1: Research on which type of investor you want to approach: angel investors [12], venture capitalist, crowdsourcing or super angels. Also, be perfectly clear about amount of funding you need and valuation of your company as this will determine amount of equity (or other form of compensation) you will have to give to the investor. There are other parameters to consider as well like how much involvement you want from investor. Here is small figure to understand the startup funding lifecycle [11]:

Step 5.2: Now research on the specific investors you want to target and classify them as either "ideal/ambitious" or "can live with/moderate". Start with investor's website and understand their funding philosophy. Then look into ventures they have funded and see how they are doing [13].
Step 5.3: Prepare your pitch deck:
- See tips from from my previous blogpost about presentation.
- Here is a really good 2-min video about pitching from Guy Kawasaki. Note the 10 slides at 10min 4 seconds.
Step 5.4: Remember there are other things more important than just capital for your startup. Ask advice and hopefully gain a mentor or board of director. Sometimes they can also refer you to another funding agency.

Since this is often the most confusion aspect of startup for a techie, let me try my best to explain and emphasize the difference between validation and planning [8, 9]. Usually, planning involves deep thinking so as to perform forecasting, whereas validation starts with brainstorming the hypothesis and then validating it through rapid customer feedback loop. Planning which is associated with waterfall-like model, starts with defining the specific goal and then coming up with strategies and detailed steps to move towards that goal. Validation, on other hand, is often associated with agile development/effectuation/lean startup methodology and then involves pivoting the goal based on customer/market feedback. Planning assumes that you know precisely what customer requirements are beforehand, validation doesn't. Since customer requirements are known, planning is suitable for large-scale established companies where the emphasis is on "execution" of those requirements. On other hand, since precise customer requirements are unknown in startups, emphasis is on "searching" of the requirements through validation of hypothesis [8]. This is why many entrepreneurs prefer a short business model canvas (which is a hypothesis generating and validating tool) over more comprehensive business models.

[1] Implementing e-commerce page on the wordpress took me no more than 30 minutes using the plugin WooCommerce. It allows you to track orders, generate reports, and even manage coupons.
[2] Google's url shortener (and many more similar free services) helps you track the traffic through the given link. Here is a useful hack: Have multiple shortened urls for your webpage and use different ones for pamphlets in different geographical areas. Then use the analytics to do much more targeted ad campaign :)
[3] For ML geeks, I must confess that the model suffers from selection bias as it only contains the data when crime occurred, not when the crime didn't occurred. Since you have to finish the project in a day according to rules of hackathon, most of which were spent on data gathering/cleaning, I had no time to work on this :(
[4] One thing that surprised me was that even though this conference was in Silicon Valley, I was the only Computer Science graduate student there, all others were either MBAs or MDs or MD-PhDs or BioTech students.
[5] My team came second and built the structure that was 26 inch tall. The idea is to start the first structure with marshmallow and keep building lower structure. Also, it is important to reinforce the structure (building supporting vertical sticks and choosing multiple sticks for lower pillars) so that it does not collapse. Most of the teams that failed started by building as tall structure as they could without marshmallow and then tried to put marshmallow on the top, which caused entire structure to collapse because of the weight. So the take-away point for entrepreneurs from this exercise is do bottom-up incremental iterations/updates with customer involvement so as to mitigate risk. Here is the pic of our structure:
[6] From Mark Otero, CEO & Co-founder, Klicknation.
[7] Daniel Pink in his book "To sell is human" suggest following template: "Once upon a time ___. Every day, ___. One day ___. Because of that ___. Because of that ___. Until finally ___".
[8] Read Steve Blank's blog for more detail.
[10] In the video, the clip where people have duck-tape on their mouth is from another youtube video about raising voice against sexual violence, i.e. it was not created by me and hence don't give me credit if you want to cite it. I used the clip because it resonates with the motivation of my project :)
[12] It might also be worthwhile to look into websites like angellist.
[13] Websites like CrunchBase and Techcrunch helps here. Other useful website you should definitely visit is Glassdoor to gauge the talent-retention ratio as well as willingness of upper-management to experiment new ideas. Why? It sometimes can be a metric to evaluate the founders and indirectly the investors. There are few ranking websites based on different criteria, one of them being Forbes Midas' list.
[14] For example: Rice Computer Science Graduate Student Association (CS GSA) is a flat organization and there are coordinators that have responsibilities/duties. Everyone can pitch in and help others, but when it comes to arbitration/point of contact/failure responsibility, the assigned coordinator is the one in-charge. For people outside the organization, that person is Overall coordinator, think of it as a President of the club and for the current year, that's me (thank you very much) :)
[15] Sure, some ideas might not align with the vision of the company and it may be dropped, but it has to be done in form of discussion rather than in an authoritative manner.
[16] Note, these are high-level steps, even though you might find the list of tools overwhelming. For detailed step-by-step instructions regarding startup process, please read one of several startup related books available on amazon or listen to startup lectures on udacity/coursera/youtube or better attend an incubator program near you.