Friday, November 13, 2015

Using Google's TensorFlow for Kaggle competition

Recently, Google Brain team released their neural network library 'TensorFlow'. Since Google has a state-of-the-art Deep Learning system, I wanted to explore TensorFlow by trying it out for my first Kaggle submission (Digit Recognition) . After spending few hours getting to know their jargons/APIs, I modified their multilayer convolution neural network and came 140th (with 98.4% accuracy) ... not too shabby for my first submission :D.

If you are interested in outscoring me, apply cross-validation on the below python code or may be consider using ensembles:
from input_data import *
import pandas

class DataSets(object):

mnist = DataSets()
df = pandas.read_csv('train.csv')

train_images = numpy.multiply(df.drop('label', 1).values, 1.0 / 255.0)
train_labels = dense_to_one_hot(df['label'].values)

#Add MNIST data from Yan LeCun's website for better accuracy. We hold out test, just for accuracy sake, but could have easily added it :)

mnist2 = read_data_sets("/tmp/data/", one_hot=True)
train_images = numpy.concatenate((train_images, mnist2.train._images), axis=0)
train_labels = numpy.concatenate((train_labels, mnist2.train._labels), axis=0)

validation_images = train_images[:VALIDATION_SIZE]
validation_labels = train_labels[:VALIDATION_SIZE]
train_images = train_images[VALIDATION_SIZE:]
train_labels = train_labels[VALIDATION_SIZE:]

mnist.train = DataSet([], [], fake_data=True)
mnist.train._images = train_images
mnist.train._labels = train_labels

mnist.validation = DataSet([], [], fake_data=True)
mnist.validation._images = validation_images
mnist.validation._labels = validation_labels

df1 = pandas.read_csv('test.csv')
test_images = numpy.multiply(df1.values, 1.0 / 255.0)
numTest = df1.shape[0]
test_labels = dense_to_one_hot(numpy.repeat([1], numTest))
mnist.test = DataSet([], [], fake_data=True)
mnist.test._images = test_images
mnist.test._labels = test_labels

import tensorflow as tf
sess = tf.InteractiveSession()
x = tf.placeholder("float", [None, 784])   # x is input features
W = tf.Variable(tf.zeros([784,10]))           # weights 
b = tf.Variable(tf.zeros([10]))                    # bias
y_ = tf.placeholder("float", [None,10])   # y' is input labels
#Weight Initialization
def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)
def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)
# Convolution and Pooling
def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
# First Convolutional Layer
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
x_image = tf.reshape(x, [-1,28,28,1])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)

# Second Convolutional Layer
W_conv2 = weight_variable([5, 5, 32, 64])
b_conv2 = bias_variable([64])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)

# Densely Connected Layer
W_fc1 = weight_variable([7 * 7 * 64, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

# Dropout
keep_prob = tf.placeholder("float")
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

# Readout Layer
W_fc2 = weight_variable([1024, 10])
b_fc2 = bias_variable([10])
y=tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
cross_entropy = -tf.reduce_sum(y_*tf.log(y))

train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
for i in range(20000):
 batch = mnist.train.next_batch(50){x: batch[0], y_: batch[1], keep_prob: 0.5})

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print "test accuracy %g"%accuracy.eval(feed_dict={x: mnist2.test.images, y_: mnist2.test.labels, keep_prob: 1.0})
prediction = tf.argmax(y,1).eval(feed_dict={x: mnist.test.images, keep_prob: 1.0})

s1='\n'.join(str(x) for x in prediction)
The above script assumes that you have downloaded train.csv and test.csv from Kaggle's website, from TensorFlow's website and installed pandas/TensorFlow.

Here are few observations based on my experience playing with TensorFlow:
  1. TensorFlow does not have an optimizer
    1. TensorFlow statically maps an high-level expression (for example "matmul") to a predefined low-level operator (for example: matmul_op.h) based on whether you are using CPU or GPU enabled TensorFlow. On other hand, SystemML compiles a matrix multiplication expression (X %*% y) into one of many matrix-multiplication related physical operators using a sophisticated optimizer that adapts to the underlying data and cluster characteristics.
    2. Other popular open-source neural network libraries are CaffeTheano and Torch
  2. TensorFlow is a parallel, but not a distributed system:
    1. It does parallelize its computation (across CPU cores and also across GPUs):  
    2. Google has released only the single-node version and kept distributed version in-house. This means that the open-sourced version does not have a parameter server. 
  3. TensorFlow is easy to use, but difficult to debug:
    1. I like TensorFlow's Python API and if the script/data/parameters are all correct, it works absolutely fine :)
    2. But, if something fails, the error messages thrown by TensorFlow are difficult to decipher. This is because the error messages point to a generated physical operator (for example: tensorflow.python.framework.errors.InvalidArgumentError: ReluGrad input), not to the line of code in the Python program.
  4. TensorFlow is slow to train and not yet robust enough:
    1. Here are some initial numbers by Alex Smola comparing TensorFlow to other open-source deep learning systems:

Tuesday, February 17, 2015

Plotting as a useful debugging tool

While writing the code for bayesian modeling, you will have to test the distribution (prior, likelihood or posterior). Here are two common scenarios that you might encounter:
  1. You want to test the function that generates random deviates. For example: you have derived a conjugate formula for a parameter of your model and want to test whether it is correct or not.
  2. You want to test a probability density function. For example: likelihood or posterior function that you might want to run rejection sampler on.
In both these cases, you will start by making sure the property of the distribution (for example: range, mean, variance) are correct. For example: if the parameter you are sampling is variance, then you will have “assert(returnedVariance > 0)” in your code. Then, the next obvious test should be visual inspection (trust me, it has helped me catch more bugs than I would by traditional programming debugging techniques/tools). This means you will plot the distribution and see if the output of your code makes sense.
We will start by simplifying the above two cases by assuming standard normal distribution. So, in the first case, we have access to “rnorm” function and in second case, we have access to “dnorm” function of R.
Case 1: In this case, we first collect random deviates (in “vals”) and then use ggplot to plot them:
vals = rnorm(10000, mean=0, sd=1)
df = data.frame(xVals=vals)
ggplot(df, aes(x=xVals)) + geom_density()

The output of above R script will look something like this:
Case 2: In this case, we have to assume a bounding box and sample inside that to get “x_vals” and “y_vals” (just like rejection sampling):
y_vals=dnorm(x_vals, mean=0, sd=1)
df = data.frame(xVals=x_vals, yVals=y_vals)
ggplot(df, aes(x=xVals, y=yVals)) + geom_line()

The output of above R script will look something like this:
Just as a teaser to a post that I will post later, we can use the script somewhat similar to that of case 1 to study the characteristics of a distribution:

How to setup “passwordless ssh” on Amazon EC2 cluster

Often for running distributed applications, you may want to setup a new cluster or tweak an existing one (running on Amazon EC2) to support passwordless ssh. For creating a new cluster from scratch, there are lot of cluster management tools (which is beyond the scope of this blogpost). However, if all you want to do is setup “passwordless ssh” between nodes, then this post might be worth your read.
The script below assumes that you have completed following three steps:

Step 1. Created RSA public keypair on each of the machine:
cd ~
ssh-keygen -t rsa

Do not enter any paraphrase, instead just press [enter].

Step 2. Suppressed warning flags in ssh-config file:
sudo vim /etc/ssh/ssh_config
StrictHostKeyChecking no
Step 3. Copied the key pair file “MyKeyPair.pem” to master’s home directory:
scp -i /local-path/MyKeyPair.pem /local-path/MyKeyPair.pem

Assuming that above three steps have been completed, run this script on the master to enable passwordless ssh between master-slave and/or slave-slave nodes:

    # Author: Niketan R. Pansare
    # Password-less ssh
    # Make sure you have transferred your key-pair to master
    if [ ! -f ~/.ssh/ ]; then
      echo "Expects ~/.ssh/ to be created. Run ssh-keygen -t rsa from home directory"
    if [ ! -f ~/MyKeyPair.pem ]; then
      echo "For enabling password-less ssh, transfer MyKeyPair.pem to master's home folder (e.g.: scp -i /local-path/MyKeyPair.pem /local-path/MyKeyPair.pem ubuntu@master_public_dns:~)"
    echo "Provide following ip-addresses"
    echo -n -e "${green}Public${endColor} dns address of master:"
    read MASTER_IP
    echo ""
    # Assumption here is that you want to create a small cluster ~ 10 nodes
    echo -n -e "${green}Public${endColor} dns addresses of slaves (separated by space):"
    read SLAVE_IPS
    echo "" 
    echo -n -e "Do you want to enable password-less ssh between ${green}master-slaves${endColor} (y/n):"
    echo ""
    if [ "$ENABLE_PASSWORDLESS_SSH" == "y" ]; then
      # Copy master's public key to itself
      #cat ~/.ssh/ >> ~/.ssh/authorized_keys
      for SLAVE_IP in $SLAVE_IPS
          echo "Checking passwordless ssh between master -> "$SLAVE_IP
        ssh -o PasswordAuthentication=no $SLAVE_IP /bin/true
        if [ $? -eq 0 ]; then
            echo "Passwordless ssh has been setup between master -> "$SLAVE_IP
            echo "Now checking passwordless ssh between "$SLAVE_IP" -> master"
          ssh $SLAVE_IP 'ssh -o PasswordAuthentication=no' $MASTER_IP '/bin/true'  
          if [ $? -eq 0 ]; then
              echo "Passwordless ssh has been setup between "$SLAVE_IP" -> master"
        if [ "$IS_PASSWORD_LESS" == "n" ]; then
          # ssh-copy-id gave me lot of issues, so will use below commands instead
          echo "Enabling passwordless ssh between master and "$SLAVE_IP
          # Copy master's public key to slave
          cat ~/.ssh/ | ssh -i ~/MyKeyPair.pem "ubuntu@"$SLAVE_IP 'mkdir -p ~/.ssh ; cat >> ~/.ssh/authorized_keys' 
          # Copy slave's public key to master
          ssh -i ~/MyKeyPair.pem "ubuntu@"$SLAVE_IP 'cat ~/.ssh/' >> ~/.ssh/authorized_keys
          # Copy slave's public key to itself
          ssh -i ~/MyKeyPair.pem "ubuntu@"$SLAVE_IP 'cat ~/.ssh/ >> ~/.ssh/authorized_keys'
      echo ""
      echo "---------------------------------------------"
      echo "Testing password-less ssh on master -> slave"
      for SLAVE_IP in $SLAVE_IPS
        ssh "ubuntu@"$SLAVE_IP  uname -a
      echo ""
      echo "Testing password-less ssh on slave -> master"
      for SLAVE_IP in $SLAVE_IPS
        ssh "ubuntu@"$SLAVE_IP 'ssh ' $MASTER_IP 'uname -a'
      echo "---------------------------------------------"
      echo "Sorry, prefer to keep this check manual to avoid headache in Hadoop or any other distributed program."
      echo -n -e "Do you see error or something fishy in above block (y/n):"
      read IS_ERROR1
      echo ""
      if [ "$IS_ERROR1" == "y" ]; then
        echo "I am sorry to hear this script didn't work for you :("
        echo "Hint1: Its quite possible, slave doesnot contain ~/MyKeyPair.pem"
        echo "Hint2: sudo vim /etc/ssh/ssh_config and add StrictHostKeyChecking no and UserKnownHostsFile=/dev/null to it"
    echo -n -e "Do you want to enable password-less ssh between ${green}slave-slave${endColor} (y/n):"
    echo ""
    if [ "$ENABLE_PASSWORDLESS_SSH1" == "y" ]; then
      if [ "$ENABLE_PASSWORDLESS_SSH" == "n" ]; then
        echo -n -e "In this part, the key assumption is that password-less ssh between ${green}master-slave${endColor} is enabled. Do you still want to continue (y/n):"
        read ANS1
        if [ "$ANS1" == "n" ]; then 
        echo ""
      for SLAVE_IP1 in $SLAVE_IPS
        for SLAVE_IP2 in $SLAVE_IPS
          if [ "$SLAVE_IP1" != "$SLAVE_IP2" ]; then
            # Checking assumes passwordless ssh has already been setup between master and slaves
              echo "[Warning:] Skipping checking passwordless ssh between "$SLAVE_IP1" -> "$SLAVE_IP2
            # This will be true because ssh $SLAVE_IP1 is true
            #ssh $SLAVE_IP1 ssh -o PasswordAuthentication=no $SLAVE_IP2 /bin/true
            #if [ $? -eq 0 ]; then
            if [ "$IS_PASSWORDLESS_SSH_BETWEEN_SLAVE_SET" == "n" ]; then
              echo "Enabling passwordless ssh between "$SLAVE_IP1" and "$SLAVE_IP2
              # Note you are on master now, which we assume to have 
              ssh -i ~/MyKeyPair.pem $SLAVE_IP1 'cat ~/.ssh/' | ssh -i ~/MyKeyPair.pem $SLAVE_IP2 'cat >> ~/.ssh/authorized_keys' 
              echo "Passwordless ssh has been setup between "$SLAVE_IP1" -> "$SLAVE_IP2
      echo "---------------------------------------------"
      echo "Testing password-less ssh on slave  slave"
      for SLAVE_IP1 in $SLAVE_IPS
        for SLAVE_IP2 in $SLAVE_IPS
          # Also, test password-less ssh on the current slave machine
          ssh $SLAVE_IP1 'ssh ' $SLAVE_IP2 'uname -a'
      echo "---------------------------------------------"
      echo "Sorry, prefer to keep this check manual to avoid headache in Hadoop or any other distributed program."
      echo -n -e "Do you see error or something fishy in above block (y/n):"
      read IS_ERROR1
      echo ""
      if [ "$IS_ERROR1" == "y" ]; then
        echo "I am sorry to hear this script didn't work for you :("
        echo "Hint1: Its quite possible, slave doesnot contain ~/MyKeyPair.pem"
        echo "Hint2: sudo vim /etc/ssh/ssh_config and add StrictHostKeyChecking no and UserKnownHostsFile=/dev/null to it"

Here is a sample output obtained by running the above script (src code available via link):
ubuntu@ip-XXX:~$ ./
Provide following ip-addresses

Public dns address of

Public dns addresses of slaves (separated by space)

Do you want to enable password-less ssh between master-slaves (y/n):y

Checking passwordless ssh between master ->
bunch of warning and possibly few "Permission denied (publickey)." (Ignore this !!!)

Testing password-less ssh on master -> slave
Don't ignore any error here !!!
Sorry, prefer to keep this check manual to avoid headache in Hadoop or any other distributed program.
Do you see error or something fishy in above block (y/n):n

Do you want to enable password-less ssh between slave-slave (y/n):y

Testing password-less ssh on slave <-> slave
Don't ignore any error here !!!

Sorry, prefer to keep this check manual to avoid headache in Hadoop or any other distributed program.
Do you see error or something fishy in above block (y/n):n

Warning: The above code does not check for passwordless ssh for slave-slave configuration and sets it blindly even if passwordless ssh is already enabled. It might not be big deal if you call this script few times but won’t be ideal if the cluster needs to dynamically modified over and over again. Still, to modify this behavior, look at the line: IS_PASSWORDLESS_SSH_BETWEEN_SLAVE_SET=”n”

Saturday, March 08, 2014

Notes on entrepreneurship (Part 3)

Few updates since the last time I wrote the blog:
1. At the end of the entrepreneurship course, the website and an iOS app was implemented and I got an A grade ... yaay !!!

2. Since I was out-of-country during the final presentation to judges (VCs, Startup founders, etc), I created a youtube video to motivate this project and my teammate explained the remaining logistics of our project [10].

3. Vigilant team won 3rd place in HackRice 2014 competition for developing Android version along with SVM classifier [3] to predict threat level based on location and time of the day. We used crime data from Houston PD for 5 years, so yes this feature only worked for Houston :(

4. I was lucky enough to get selected to participate in Ignite Conference 2014 [4]. To get the ball rolling on the first day, we were asked to build as tall structure as possible in given time frame with marshmallow on the top using noodle sticks, tape and thread [5]. The next day, we got to visit Khosla Ventures and Square (and few other startups). On last two days, we had talks from several guest entrepreneurs where they spoke about their entrepreneurship journey. Here are few of the lessons I learnt (directly/indirectly from this conference):
4.a What ideas/projects to work on ?
- One of the most common advice you will receive is "fail fast, fail often and fail cheap". However, I suggest not to take it verbatim and understand that the emphasis is not on failing but learning. So, you should not go with the mindset that if I am anyways planning to fail early, let me start with a dumb idea. In fact, the mindset should be "test early/cheaply; if you do fail, don't get hung over it, instead learn quickly why did you fail; and then adapt". Let me put this in much broader context in terms of one of the most important lesson I have learnt in research and entrepreneurship: though most people will judge you by your bank balance and your citations, you (as a researcher or entrepreneur) should try to evaluate yourself by your ability to keep failures small and useful.
- Don't hesitate or hold yourself back if you are passionate about an idea that appears to be "black swan". But makes sure you "know what you know and know what you don't know". Then, evaluate whether things you don't know will halt the project, if so, find an expert who can help you with that.
- Before you think of quitting your job and start a venture, make a list of non-glamorous/routine stuff about startup you will have to do. Though many of these might be solvable by capital, if you do intend to go the lean startup way, ask yourself: Are you mentally flexible to get over your ego and do these task if need be ?
4.b Building a team and network:
- Build team that is smarter than you. And make sure you give back and focus on their growth too, else you will soon face with talent retention problem.
- Here are some of the key points to build and keep great teams: Minimize unnecessary bureaucracy [14], exercise merit-based promotion strategy, treat others with respect and most importantly, with respect to improvement of your startup, encourage everyone to pitch-in their ideas rather than giving everyone list of to-do items [15].
- Hire a really good sales team you can find. This is especially true in internet technology though it has scalability challenges. Why? People would much rather prefer to deal with a sales person rather than internet widget when they are paying huge money for the given service. This also means don't wait until your product testing phase to hire a sales team. Rather start with integrated sales-development team and use sales part of the team as to validate/refine the hypothesis by communicating with customers during the product development phase. Best case scenario, the founder is that person ... that first assumes the role of salesman, talks to clients/customers, figures out requirements, then by himself or by hiring really good team develops the product, markets it, and then keeps on iterating the whole process again [8].
- Networking is extremely critical. It usually takes time, else it seems inauthentic/insincere. So, don't use the number of LinkedIn connections/visiting cards in your wallet as measure of networking, instead count number of people who genuinely want to help you and vice versa. Other way to put it: networking is not about how many people you are acquainted to but rather how many mentors, benefactors and friends you have.
4.c Managing yourself:
- It is extremely important to find ways to manage stress and take care of your health. Also, if your better-half and family understands the challenges of startups and supports/love you, the stress/difficulties reduces exponentially.
- Especially in software development phase, use productivity rather than hours worked as measure.
- Know the difference between persistence and stubbornness. Remember persistence (without stubbornness) requires learning and pivoting given new information/lessons, rather than doing same things over and over again expecting different results [6].
- When working in team, your idea is only good to the degree to which you can explain it. If your team doesn't understand it, you can almost be sure that your customer won't be able to understand it. This skill will probably require an entirely new blogpost (which I might write later), but for now here are three things you can do:
a. Read books about presentation/explanation/UI design: Design of everyday things, Don't make me thinkBack of NapkinArt of explanation, Whiteboard selling, slide:ology and 100 things every designer needs to know.
b. Learn empathy (i.e. relating to team/developers/sales and especially customers).
c. Be specific. If you don't break up tasks into smaller items and clarify the purpose/big picture, people (either your team or investor) might feel overwhelmed and can sometime get defensive. One particular thing that every member on your team needs to be very crystal clear about is "minimum viable product".

Let me recap the high-level steps [16] described until now in this blog series:

Step 0: Define your means: "what I know",  "what do I have" and "whom do I know".

Step 1: Ideation: See part 1 of this series for high-level tips.
Step 1.1: Being little more specific about step 1: Create/Refine 60 second elevator pitch: The key idea is if you meet a key investor/evangelist/strategic partner in an elevator and if he asks you what are you working on, you should be able to summarize your business idea before the elevator reaches certain floor and that person has to leave. Here are few suggestions about creating your elevator pitch:
- Sit down and write your elevator pitch without too much thinking. First iteration will always be awful.
- Iterate, rehearse, iterate again and then rehearse again (and keep doing this) until your elevator pitch becomes second nature to you.
- Talk to your customers and experts in that field and try to incorporate their feedback.
- Be honest and don't promise "world peace". Remember, your words represents your thoughts/character and the investor is investing in you as much as he/she is investing in the idea.
- There is no one way to do this, so learn-by-example; aka do youtube/google search for "elevator pitch" and listen to them [7].
- Use analogy to simplify the problem.
- If possible, explain how the problem affects you, your family or someone you know.
- The first and last sentence are extremely important and connecting them helps to create a more effective delivery.
Here is one of the iteration of my elevator pitch:
Studies show that in the US alone, one out of four women get sexually assaulted in their life-time. This number is much higher when they travel to developing countries like India. Though this problem is highly complex, we believe that a fraction of it can be solved by technology-based deterrents that act as first line of defense when in danger. We present one such solution: Vigilant. Vigilant is an affordable and hassle-free personal safety solution that connects you to friends, family and authorities during emergency with just a click of a button. The goal of Vigilant is to make everyones life safer with the help of smarter software and hardware.
Step 2: Opportunity evaluation:
Step 2.1: Use high-level checklist described in part 1 of this series.
Step 2.2: Create a business model canvas: Before you read anything further, you should definitely watch this 2 min video describing the business model canvas. Also, read the book Business Model Generation to understand why and how to create a business model canvas. If you don't have time to read through the book, here is few websites that allows you to create one by step-by-step instructions: zoomstralaunchpad central, turbostart, wordpress plugin, collaboration website. You can also go through following lectures/tutorials to learn how to create one: Steve Blank's udacity class,  Alex's blog.
Step 2.3: Early customer validation/requirement gathering using one or more of the feedback/analytics tools (some are applicable only at later stages): These tools help you quantify/find out: what people think they want, how much they think they want to pay and how much they will really pay.
- False buy page or dummy (but fully functional) e-commerce page for buying clickers [1].
- Contact form plugin in wordpress.
- Feedback button in app.
- Site Traffic widget/plugin in wordpress.
Facebook likes.
Google forms for suggestions from early adopters: Here are few suggestions for coming up with good questions (ones that I didn't know earlier): Kevin's blog, Stanford's videos for customer discovery, Learn startup customer development templates, Alex's blog, Steve's suggestions, Kaushik's suggestions.
- Other tools that you can use for market research are Google's customer survey tool, Amazon's mechanical turk, Qualtrics.
- Pamphlet with google url shortener[2] and QR codes across campus.
- Other tools that allow to do customer validation: validatelyfoundersuite.

Step 3: Build a great team !!!
- See suggestions given above in point 4.b and 4.c.
- There are few websites that help you find your founders: FoundersDating, CoFoundersLab or attend a meetup or hackthons or startup weekend.

Step 4 (or 5): Implementation, marketing, validation, refining business model canvas and keep on iterating.
This step requires above mentioned feedback/analytics tool as well as marketing/branding tools:
- Checkout fiverr for really cool and cheap marketing ideas.
- Make sure you install SEO plugin for your wordpress. I am using this plugin, but there are equally good plugins in the market.
- Create a marketing message/keywords based on above feedback tools or google trends or google keyword tool. Then keep refining it with respect to location based on google insights. Similarly, there are several analytics tools that will help you in marketing/branding like Google AnalyticsAmazon AnalyticsHeap AnalyticsAppAnnieDistimo, etc.
- Advertise on GoogleFacebookBing or LinkedIn. If you prefer traditional marketing tools, you can look at srds.
- Read Art of explanation and make a user-friendly video about your product. You can use one of the following tools to create the video: Sparkol, Explainify, LooseKeys, Amazon's Elastic Transcoder, Camtasia.
For exhaustive list of tools, refer to Steve Blank's list, YCombinator's list, Startup Weekend's list, PBWork's list, HBS list.

Step 5 (or 4): Funding your startup:
Step 5.1: Research on which type of investor you want to approach: angel investors [12], venture capitalist, crowdsourcing or super angels. Also, be perfectly clear about amount of funding you need and valuation of your company as this will determine amount of equity (or other form of compensation) you will have to give to the investor. There are other parameters to consider as well like how much involvement you want from investor. Here is small figure to understand the startup funding lifecycle [11]:

Step 5.2: Now research on the specific investors you want to target and classify them as either "ideal/ambitious" or "can live with/moderate". Start with investor's website and understand their funding philosophy. Then look into ventures they have funded and see how they are doing [13].
Step 5.3: Prepare your pitch deck:
- See tips from from my previous blogpost about presentation.
- Here is a really good 2-min video about pitching from Guy Kawasaki. Note the 10 slides at 10min 4 seconds.
Step 5.4: Remember there are other things more important than just capital for your startup. Ask advice and hopefully gain a mentor or board of director. Sometimes they can also refer you to another funding agency.

Since this is often the most confusion aspect of startup for a techie, let me try my best to explain and emphasize the difference between validation and planning [8, 9]. Usually, planning involves deep thinking so as to perform forecasting, whereas validation starts with brainstorming the hypothesis and then validating it through rapid customer feedback loop. Planning which is associated with waterfall-like model, starts with defining the specific goal and then coming up with strategies and detailed steps to move towards that goal. Validation, on other hand, is often associated with agile development/effectuation/lean startup methodology and then involves pivoting the goal based on customer/market feedback. Planning assumes that you know precisely what customer requirements are beforehand, validation doesn't. Since customer requirements are known, planning is suitable for large-scale established companies where the emphasis is on "execution" of those requirements. On other hand, since precise customer requirements are unknown in startups, emphasis is on "searching" of the requirements through validation of hypothesis [8]. This is why many entrepreneurs prefer a short business model canvas (which is a hypothesis generating and validating tool) over more comprehensive business models.

[1] Implementing e-commerce page on the wordpress took me no more than 30 minutes using the plugin WooCommerce. It allows you to track orders, generate reports, and even manage coupons.
[2] Google's url shortener (and many more similar free services) helps you track the traffic through the given link. Here is a useful hack: Have multiple shortened urls for your webpage and use different ones for pamphlets in different geographical areas. Then use the analytics to do much more targeted ad campaign :)
[3] For ML geeks, I must confess that the model suffers from selection bias as it only contains the data when crime occurred, not when the crime didn't occurred. Since you have to finish the project in a day according to rules of hackathon, most of which were spent on data gathering/cleaning, I had no time to work on this :(
[4] One thing that surprised me was that even though this conference was in Silicon Valley, I was the only Computer Science graduate student there, all others were either MBAs or MDs or MD-PhDs or BioTech students.
[5] My team came second and built the structure that was 26 inch tall. The idea is to start the first structure with marshmallow and keep building lower structure. Also, it is important to reinforce the structure (building supporting vertical sticks and choosing multiple sticks for lower pillars) so that it does not collapse. Most of the teams that failed started by building as tall structure as they could without marshmallow and then tried to put marshmallow on the top, which caused entire structure to collapse because of the weight. So the take-away point for entrepreneurs from this exercise is do bottom-up incremental iterations/updates with customer involvement so as to mitigate risk. Here is the pic of our structure:
[6] From Mark Otero, CEO & Co-founder, Klicknation.
[7] Daniel Pink in his book "To sell is human" suggest following template: "Once upon a time ___. Every day, ___. One day ___. Because of that ___. Because of that ___. Until finally ___".
[8] Read Steve Blank's blog for more detail.
[10] In the video, the clip where people have duck-tape on their mouth is from another youtube video about raising voice against sexual violence, i.e. it was not created by me and hence don't give me credit if you want to cite it. I used the clip because it resonates with the motivation of my project :)
[12] It might also be worthwhile to look into websites like angellist.
[13] Websites like CrunchBase and Techcrunch helps here. Other useful website you should definitely visit is Glassdoor to gauge the talent-retention ratio as well as willingness of upper-management to experiment new ideas. Why? It sometimes can be a metric to evaluate the founders and indirectly the investors. There are few ranking websites based on different criteria, one of them being Forbes Midas' list.
[14] For example: Rice Computer Science Graduate Student Association (CS GSA) is a flat organization and there are coordinators that have responsibilities/duties. Everyone can pitch in and help others, but when it comes to arbitration/point of contact/failure responsibility, the assigned coordinator is the one in-charge. For people outside the organization, that person is Overall coordinator, think of it as a President of the club and for the current year, that's me (thank you very much) :)
[15] Sure, some ideas might not align with the vision of the company and it may be dropped, but it has to be done in form of discussion rather than in an authoritative manner.
[16] Note, these are high-level steps, even though you might find the list of tools overwhelming. For detailed step-by-step instructions regarding startup process, please read one of several startup related books available on amazon or listen to startup lectures on udacity/coursera/youtube or better attend an incubator program near you.  

Saturday, October 12, 2013

Notes on entrepreneurship (Part 2)

In the first class, we divided ourselves into random groups, each with four people (mostly from different background). The idea was to pitch an startup idea within 10 minutes based on the means of our group members. This exercise was repeated again with different set of people. One thing I noticed after this exercise was even though it seems like nice idea to start with your means and build a startup based on that, people usually revert back to the idea that they are absolutely passionate about, irrespective of their means [8].

As an assignment that week, we were supposed to take $5 and in span of two hours make as much money as we can. My group decided to go with "Grocery delivery service for professors" and each of us were supposed to ask professors in our department to help with that. I got to speak to only few professors in Computer Science department (as it was Friday) and only Swarat was on-board with the idea. My team-mates were unable to get any professors from their respective department, so they decided to go with another idea: "Personalized cards and their delivery". We made $10 in tips with the first idea and ~$13 with the second idea. The lesson from this exercise was sales is hard, but probably the important part of a startup [9].

In the next class, each of us gave an elevator pitch of their idea for a startup. Let me put my idea with respect to previous post:
1. Means:
- What do I know ? Background in software development, research experience in building large-scale systems and knowledge of embedded systems.
- What do I have ? Very low capital ... student salary :(
- Whom do I know ? Software professionals in India and US (from my bachelors, masters and job experience), Marketing/Advertising professionals in India (my father's advertising firm and my MBA friends), Trustworthy partner in India (my best friend and brother Rohan), Research scientists (my advisor, my colleagues at Rice university [3], my collaborators, contacts from internships and also from my experience as President of Rice Computer Science Graduate Student Association). Other than couple of exceptions, until now I have been lucky enough to be surrounded by really nice people, which is why I believe this to be my strongest means [4].

2. Ideation:
In the previous blogpost, I vented my frustration over sexual assaults cases in India as well as provided high-level suggestions (which I must admit I had no control over). So, I decided to use my means to develop something that might help improve the situation (in whatever little way possible).
Unlike US, the commonly-accepted safety net in India is not government/police, but the social structure (i.e. your friends and family). Many of the personal safety solutions like pepper spray, taser or guns require state licenses in many countries and are even prohibited in few countries like Canada, China, Bangladesh, Singapore, Belgium, Denmark, Finland, Greece, Hungary, etc. Not to mention they are expensive and can often escalate the situation.  This is why many people follow a self-imposed curfew, that is either not leave home after dark or have someone accompany you.
With advent of smartphones, an obvious solution seems to be "have a personal safety app". Most of these apps have cool features such as sending GPS location to your friends and family but have a major flaw that make them non-functional: they require you to take out your phone, open the app and press an SOS button in it. Clearly, this is not what an average person would do in times of danger. Here are some of the feedback/comments on such apps [1]:
- I think a "dangerous environment" is the last place I would want an expensive item like an iPhone prominently displayed.
- ... app (might not be) readily available to access, on screen #8 somewhere on the 2nd or third row ...
- Pulling out your iPhone in the face of an attacker? That's one sure way to escalate the situation. And he now knows you have a nice phone too.
Reality check: Not only I am passionate about this problem, there does not exists a solution that works satisfactorily (i.e. "pain points") ... to me, they are just cosmetic app, the ones you buy for keepsake but does not serve any purpose to the society.

Before discussing next point, let me take a small detour and tell you what metric I use to determine the success of this startup [2]: At the end of the course (or may be few months past that), I am able to make an app that works seamlessly in real-life situation and which I can recommend to my loved ones without any hesitancy.

3. Opportunity evaluation:
Like many computer programmers, I have an habit of developing software that do exactly what I want but not what my audience would need. To ensure that it doesn't happen, I sent out an survey asking people what they would like in during times of emergency. The demographics of participants were as follows:
- 53% males and 47% females
- 82% of participants were between 20-30 years old and 14% between 30-40 years old.
- 59% from US, 16% from India and 8% from China
- 24% had Bachelors degree, 47% had Masters degree, 27% were PhD students.
- 32% earned between $10K-50K, 28% earned between $50K-100K, 16% earned above $100K and 20% were not currently earning.

Here is the summary of responses:

The above figures shows that majority of people (46%) wanted an external device like bluetooth-clicker that victim can press when he/she is in danger. This was a good indicator that I should go ahead and spend some time building such an app.

The next step was to understand what features a user would want/need in times of emergency:

Since some of these features require backend services (for sending emails/SMS, managing account/app) which need capital (I am not rich enough to maintain this kind of service on my own), I asked how much people are willing to pay for this kind of service. It was clear that most people preferred buy-once kind of a model:

Now, the checklist of opportunity evaluation:
a. Unique value proposition of my idea: "Affordable" and "hassle-free" way to connect to your friends, family and authorities in times of emergency ... with just a click of button.
b. Is it defensible: Nope, anyone with strong programming experience can replicate the features of my app. In fact, in long run, that is exactly what I want ... lot of good apps and competition that eventually help reduce the number of sexual assaults and make the world little safer place.
c. Is it profitable/sustainable: I really don't know :( ... It could be a product like dropbox, which people never thought they would want ... but once they got it, they can't imagine their life without it ... or it could be a total flop. The only way to be completely sure is by implementing it :)
d. Clearly defined customer: Young women traveling abroad or working late, senior citizens and frequent travelers.
e. Is it feasible and scalable: Yes, I used my experience in building large-scale cost-effective systems to build the backend. Also, my experience in C/C++ programming, knowledge of design patterns and user-friendly Apple documentation helped me: (a) learn the basics of Objective C in less than a week and (b) build the version 1.0 of the iOS app in less than a month.

Finally, I must reiterate the core principle of opportunity evaluation for a startup: It is not possible to know a priori whether an idea will turn out to be good business or not. So, I decided to stick to "affordable loss" principle and develop the app with as low cost as I could. Few of the hacks I used to ensure "affordable loss" are as follows:
- Moving most of the computing to clients rather than server (so as not to buy overly expensive servers).
- Using pay-as-you-use services wherever I felt absolutely necessary (for example: Amazon web-services).
- Using GIMP to develop my own logo (which I had to learn btw :P) rather than hiring a designer [6].
- Using wordpress for the startup website rather than spending days perfecting the CSS to make it mobile-compatible or hiring a web-developer.
- Focusing on minimum viable product (using Texas Instrument's SensorTag) rather than prematurely buying tons of bluetooth clicker from China [12].
- Using in-app purchases rather than setting up credit-card system in my website to provide features which other services charge me (for example: SMS/Email) [10].
- Not running after patents early on in the venture [5].
- Buying readily-made icons set rather than designing them yourselves [7].
- Choosing a hosting plan that has no hidden fees and that supports your choice of backend services.
- Using gmail as support email (rather than one provided by hosting services) and adding feedback button in app as well as contact form on the website (with some kind of captcha [13]). One tip I have for new developers is try to minimize the number of clicks/typing in app for sending feedbacks, for example: pre-fill "to-" address as well as "subject line".
- Utilizing the membership benefits of Apple/Google development program, i.e. off-loading testing [11], advertising/cross-promotion, expert feedback, reliable delivery of your software and version management.

References / footnotes:
[2] Rather than use metrics like expected profit/revenue after first year or public offering or something on similar lines.
[3] I already have got 4 other PhD students from Rice university on-board to develop Android/Windows version of the app. This is in lines with another principle of Effectual Entrepreneurship: Form partnerships.
[4] The intent of this statement is not flattery, just an observation. Here is why I think so: though I never deliberately tried to enforce it, my circle of influence consists of three types of people: those who are genuinely nice, those who are smarter than me, and non-self-destructive people. Of course, the categories are not mutually exclusive and in fact people who are smarter are usually genuinely nice (probably because they only focus on self-upliftment, not on pulling others down). Here is a layman example most people can relate to: even in course with relative grading, a smart person knows that he/she is a student of global class and is not be threatened by his/her class-mates' progress ... which is probably why you will rarely find such a student shying away from group discussion (so as to gain advantage) or deliberately spreading false information to sabotage other's grades. May be things might not be such black-and-white in other fields, which is one of the reason why I absolutely love research and development.
[5] A patent attorney who attended our class thought one of the feature of our app is patentable, which might be true. But there are 2 problems with going that way: (a) the real cost of patent is not in getting it, but in defending it, (b) It will limit the features that other programmers who are smarter than me can introduce in their personal safety app (which goes against my success criteria for this particular problem). To be completely honest, there are ways around low-capital issue for those startups who really want to get a patent: (1) file provisional patent yourself under $200 (just read about how to define the scope of your patent and also about court dates), (2) ask your parent organization or angel investor to file a patent for you in exchange for royalties or stake in the startup (for example: Rice university's OTT office), (3) contact a patent troll to defend you patent.
[6] For people with little more budget, there are websites like and that allows you to hire a free-lance designers/developers. A similar website for building mock prototypes of your products to show to investor is
[7] There is always a tradeoff between time and money, you just have to figure out what is your exchange rate for time ;) ... for example: if someone is providing you service that will save you 1 hour, how much are you willing to pay for that service.
[8] Whether building your startup "based on your existing means" is better than "based on your passion and then expanding your means as you go" or vice versa, I really don't know. There are obvious advantages for both and also obvious disadvantages when pushing the respective principle to extreme.
[9] Even though you might think an idea is pretty good (and will benefit the customer), people are not willing to pay as much as you think ... probably because it's either suspicion that you are trying to dupe them or incorrect valuation of the product/market from your end or something else.
[10] Though it might seem simple to just plug-in existing credit-card library and setup a php page with mysql backend, things get a little complicated when you start thinking about transactional semantics and the fact that people can buy new devices or deleting the app and similar situations.
[11] In traditional company, a developer would be evaluated based on stupid metrics, such as bugs assigned, solved, etc. Though on paper they seem apt, they can have harmful side-effects for startup such as spending too much time perfecting a feature without any customer validation/feedback. So, instead of spending significant resources on testing, submit your app to Apple as and when you add new feature and indirectly ask them to test it :) ... Other way, for cheap testing is by using crowd-sourcing websites such as Amazon's mechanical turk (which I will ignore for this post).
[12] Since the SensorTag took a while to be delivered to my home address (thanks to my apt complex rejecting the package), I decided to use an accessory that I already owned for version 1.0 (i.e. headphones) and introduce bluetooth feature in the next version.
[13] There are lot of wordpress plugins that allow you to add captcha in your website in just few minutes.