Wednesday, January 31, 2007

What is Research?

READING:


Books are not scrolls.

Scrolls must be read like the Torah from one end to the other.

Books are random access -- a great innovation over scrolls.

Make use of this innovation! Do NOT feel obliged to read a book from beginning to end.

Permit yourself to open a book and start reading from anywhere.

In the case of mathematics or physics or anything especially hard, try to find something anything that you can understand.

Read what you can.

Write in the margins. (You know how useful that can be.)

Next time you come back to that book, you'll be able to read more.

You can gradually learn extraordinarily hard things this way.

Consider writing what you read as you read it.

This is especially true if you're intent on reading something hard.

I remember a professor of Mathematics at MIT,

name of BERTRAM KOSTANT,

who would keep his door open whenever he was in his office, and he would always be at his desk writing.

Writing. Always writing.

Was he writing up his research? Maybe.

Writing up his ideas? Maybe.

I personally think he was reading, and writing what he was reading.

At least for me, writing what I read is one of the most enjoyable and profitable ways to learn hard material.



STUDYING:

You are all computer scientists.

You know what FINITE AUTOMATA can do.

You know what TURING MACHINES can do.

For example, Finite Automata can add but not multiply.

Turing Machines can compute any computable function.

Turing machines are incredibly more powerful than Finite Automata.

Yet the only difference between a FA and a TM is that

the TM, unlike the FA, has paper and pencil.

Think about it.

It tells you something about the power of writing.

Without writing, you are reduced to a finite automaton.

With writing you have the extraordinary power of a Turing machine.

THINKING:

CLAUDE SHANNON once told me that as a kid, he remembered being stuck on a jigsaw puzzle.

His brother, who was passing by, said to him:

"You know: I could tell you something."

That's all his brother said.

Yet that was enough hint to help Claude solve the puzzle.

The great thing about this hint... is that you can always give it to yourself !!!

I advise you, when you're stuck on a hard problem,

to imagine a little birdie or an older version of yourself whispering

"... I could tell you something..."

I once asked UMESH VAZIRANI how he was able,

as an undergraduate at MIT,

to take 6 courses each and every semester.

He said that he knew he didn't have the time to work out his answers the hard way.

He had to find a shortcut.

You see, Umesh understood that problems often have short clever solutions.



There will come a time when you work on a problem long and hard but UNsuccessfully :(

And then you learn that someone else found a solution.

See this as the GREAT opportunity it is to learn something important.

Don't let it pass you by.

Ask yourself: "How SHOULD I have been thinking to solve that problem?"

I have found that doing so is a powerful exercise.

Danny Sleator tells me that BOB FLOYD independently recommended exactly this exercise to his students.

He would lead them into asking themselves:

"How COULD I have led myself to that answer?"

Take the time to think it through.

It's worth it.

There will come a time when you work on a problem long and hard and SUCCESSFULLY :)

And then you learn that someone else already published. :(

Hard as that may be for you to take, you must view this too as a great opportunity.

Don't turn off. Read what got published.

You will be surprised how often the published paper turns out to be different in some significant way. Roughly

50% of the time, it is NOT at all the same as what you did.

25% of the time, it is the same but not as good.

25% of the time it is better.

This means that 50% of the time or more, you can still publish.

And what about the 25% time that what got published is better than your own?

In that case, you have a great opportunity to learn.

Ask yourself: "How SHOULD I have been thinking to solve the problem in this fine way?"

This is how I discovered, as a young engineer, that I should learn something enormously powerful called "Modern Algebra."

It's one reason I switched from Electrical Engineering as an undergraduate major to Mathematics as a Graduate major.

Of course, this was before there existed anything called Computer Science.



Still on THINKING...

The importance of PARADOX and CONTRADICTION.

When you can prove that a statement S is true,

and you can prove that the same statement S is false,

then you KNOW that that you're on to something:

Something is wrong somewhere.

Never underestimate the power of a contradiction.

It is one of our most potent sources of knowledge.

Examples include the Liar Paradox "This statement is false." with its applications to Set Theory and our understanding of language.

There are the seeming paradoxes of countability and uncountability,

In CS, there is the apparent paradox that leads to The Halting Problem.

Physics has lots of paradoxical material:

Quantum Theory. The Einstein-Rosen-Podulsky Paradox.

The relativistically speeding Twins.

The wave and particle nature of matter.

Here's an ASIDE on my current work, also based on paradox:

I am personally interested in the Paradox of consciousness. Compare the following two views:

1. the view that the human is a MECHANISM, an automaton

with substantial but finite internal memory, programmed like

any computer to do whatever it does, and/or

2. the view that the human is a thoughtful observant

creature with a God-like free will; that it is a CONSCIOUS

ENTITY at the controls of a highly complex highly capable

mechanism, choosing what to do from among options served up

by/from its vast unconscious below.

In my view, both these views are correct. How can that be?

In his "Life of Johnson," James Boswell quotes Samuel Johnson

as saying:

"All theory is against the freedom of the will; all experience

is for it."

Johnson was 18 years old when Newton (age 85) was buried.

Johnson knew that F=MA implied that humans are mechanisms.

"All theory is against the freedom of the will; all experience

is for it."

This ends my ASIDE.

Make a list for yourself of good ways to pursue a problem.

My own favorite is to try small examples. By comparison,

DAVID GRIES's favorite is to put himself in the middle

of a (presumed) solution. An example is his coffee can problem:

Given a can of black and white coffee beans, do the following: Pull out two beans: if both are the same color, replace them with a white bean. If the two are different, replace them with a black bean. What color is the last bean?

Or try out the two methods on the Hershey Bar problem

[Give an optimal algorithm to break an mxn Hershey bar into

1x1 pieces. At each step, you can choose a single rectangle

of chocolate and crack it along one of its vertical or

horizontal lines. A single crack counts one step. You are to

make the fewest number of cracks]

Brains are muscles.

They grow strong with exercise.

And even if they're strong, they grow weak without it.

In the months before Kasparov lost to Deep Blue,

his mother came after him.

She was worried that he wasn't spending enough time

exercising himself (on chess).

Her worries proved well-founded.

THE PhD: GETTING STARTED

I remember a great summer job I once had at IVIC

(Instituto Venezolano de Investigaciones Cientificos).

A top neurophysiologist, name of Svaetichin, gave me a splendid problem... one that I unfortunately could not solve.

The problem was to find a way to focus light on a single cell

of a goldfish retina so that the light would not spill over

onto any of the adjacent cells.

Svaetichin had tried making a pinhole in a sheet of black tin,

and shining his light thru the hole. This worked for moderate size holes, but failed for really small holes, which caused the light to diverge, to form diffraction patterns.

Since Svaetichin couldn't solve the problem, I decided I couldn't. Or perhaps it's that I thought his problem physically unsolvable. In retrospect, I should have taken out books on physics, especially optics, read as much as I could, talked to others and kept on talking to him.

Svaetichin would have helped me if I had shown him I was reading thinking working.

Don't expect your thesis advisor to give you a problem that he or she can answer. Of course, she might.

* She might give you a problem to which she already knows an answer.

* She might give you a problem that she thinks is answerable,

but that she hasn't actually answered.

* She might give you a problem that is deadly hard.

* If the problem she gives you is hard enough,

I suggest you look for a NONSTANDARD answer.

More on this later after I get done cooking the thesis advisor.

Your thesis advisor may encourage you to work in an area

that she feels completely comfortable in... in which case you can rely on her for sage advice and sound guidance.

Or she may encourage you to work on something she knows little or nothing about, in which case it will be up to you to inform and teach her.

In the latter case, you will have to learn all you can for yourself...

You will have to learn from other faculty, from courses, from books, from journals. from peers.

Both kinds of advisors can work out for you.

I don't know that one is necessarily better than the other.

But you should know which you got.

Whatever you do, you got to like doing it....

You got to like it so much that you're willing to think about it, work on it, long after everyone else has moved on.

THE PhD: DEEP IN THE MIDDLE OF IT.

There's a wonderful quote from ANATOLE FRANCE:

"A University Student"

-- and this is especially true for a PhD Student --

"should know something about everything

and everything about something."

You know the jokes about PhD's...

A PhD knows more and more about less and less

until he knows everything about nothing.

When working on a PhD, you must focus on a topic so narrow that you can understand it completely.

It will seem at first that you're working on the proverbial needle, a tiny fragment of the world, a minute crystal,

beautiful but in the scheme of things, microscopic.

Work with it. And the more you work with it, the more you penetrate it, the more you will come to see that your work, your subject, encompasses the world.

In time, you will come to see the world in your grain of sand.

To see a world in a grain of sand

Or a heaven in a wild flower,

Hold infinity in the palm of your hand

And eternity in an hour.

WILLIAM BLAKE (1757-1827)

This gorgeous quartet is followed by a large number of sometimes deep sometimes questionable aphorisms, which I see much like the occasionally grinding work of a PhD thesis.



There's all kinds of research you can do.

There's research to prove what you know to be true.

There's research -- maybe better called SEARCH -- to figure out what is true.

Some of the best such search succeeds in DISPROVING

what you initially believed to most certainly be true.

For example, Sir Fred Hoyle is said to have coined the phrase "Big Bang" at a time when he was looking to disprove it.

For a relatively minor but personal example:

When I was working on the MEDIAN problem,

my goal was to prove that any deterministic algorithm to find the MEDIAN of n integers must necessarily make roughly as many comparisons as it takes to sort n integers, i.e. n log n comparisons.

I was shocked to discover that the median of n integers can be found with just O(n) comparisons.

When working on proving some statement S true,

you should spend at least some time trying to prove it false.

Even if it's true, trying to prove it false can give insight.

And in any case, too often, our intuition is dead wrong.

There is yet another sense in which, when working on a hard problem, you may find that the answer is NOT what you expected.

You may be looking for a YES or a NO; it may be something else.

Some years ago, JOHN HOPCROFT gave one of his PhD students the problem of deciding the Equivalence of Free Boolean Forms.

The specifics don't matter.

The problem appeared as an open problem in Garey and Johnson.

The question was: Is the Equivalence problem NP-complete?

Or is it solvable in poly time?

Chandra, Wegman and I found a randomizing algorithm for this problelm. At the time this seemed to beg the question entirely. Only after writing it up did we really understand that we had given an efficient albeit randomizing algorithm to solve it, This shows, by the way, that the problem is not NP-complete if NP <> RP (Randomizing Poly-time), as seems likely.

Of course, this brings up the question whether P = NP.

The question of our time: Are NP-complete problems solvable in poly time?

Could anything I have said today be useful for so hard a problem as that?

Probably not. Nevertheless...

LEONID LEVIN believes as I do that whatever the answer to the P=NP? problem, it won't be like anything you think it should be. And he has given some wonderful examples.

For one, he has given a FACTORING ALGORITHM that is proVably optimal, up to a multiplicative constant.

He proves that if his algorithm is exponential,

then every algorithm for FACTORING is exponential.

Equivalently, if any algorithm for factoring is poly-time,

then his algorithm is poly-time.

But we haven't been able to tell the running time of his algorithm because, in a strong sense, it's running time is unanalyzable.

Maybe as STEVEN RUDICH suggests, the P=NP? problem is undecidable in the standard formalization of Mathematics.

The point is that the answer may not lie where you expect it. Here's a poem I wrote when I wondered at the fact that we must sometimes be dragged kicking and screaming in the right direction.

It's a comparison to the blind spot in our eyes,

which isn't really blind but makes things up for us.

It questions whether there might not be other things in this world that our brains, our minds, by their very nature, make up for us:

Blind Spots mb 15-MAY-96

All men have Blind Spots in their eyes,

That manufacture visions of their vale.

And shape that void where light's unregistered,

With bold-faced unrepentant tales.

What other blind spots shape our minds and thoughts?

What other tales do won'dring minds unfurl

To woo us unbeguiled we would believe

To strange and nonexistent worlds?



ABOUT WRITING:

Here is the one quote that I have found most helpful and wise:

"First have something to say,

Second say it,

Third stop when you have said it,

and

Finally give it an accurate title."

JOHN SHAW BILLINGS [1838-1913]

MY ADVICE TO YOU:

Don't expect your thesis advisor to read your thesis. Some thesis advisors can and do give good feedback, but not all.

Still, make sure that SOMEBODY reads your thesis...

I especially recommend that you ask your peers.

Here's another piece of advice that I have often had to give myself, and I here give you...

When you send a paper off to be published,

and it gets rejected...

Don't be turned off by the mindless cretinous feedback

that you get to your well-thought-out beautifully-written work!

Be a MENSCH. Use the feedback to improve your paper!

Make it better. And send it back.



Finally, it is my most earnest wish that you should know something that is honestly amazingly true of you... That you are each of you UNIQUE and SPECIAL in some glorious way.

I wrote a poem to capture this, which I now use to end my sermon. It's called "Fundamentals."

FUNDAMENTALS


mb 05-JUN-96

Bird must soar. Skunk must stink.

Cat must prowl. Man must think.

What sets man apart from beast is his engine of

thought. His mind. His

BRAIN

makes him unique

and gives him his greatest pleasure.

But fundamental as is thought for human beings,

there is stuff more basic still that underlies and

DRIVES

not only man

but all great beasts,

And that is nature's call to each of us... to be special.

To be distinguished in some way. To be

UNIQUE.

To BE something, to DO something, BETTER than everyone else.

Like the leather nosed chimpanzee,

dragging noisy cans and branches,

frightening peers into submission,

One does not have to be brilliant, a genius, to be special.

To do something better than anyone/everyone else. To be

UNMATCHED,

One has only to choose an END

any END

that MATTERS

that INSPIRES

YOU

And then DO IT.

By, Manuel Blum.

(http://www.cs.cmu.edu/~mblum/research/pdf/grad.html)

Manuel Blum (born 26 April 1938) is a computer scientist who received the Turing Award in 1995 "In recognition of his contributions to the foundations of computational complexity theory and its application to cryptography and program checking". Blum attended MIT, where he received his Bachelor's degree in 1959, his Master's degree in 1961, and his PhD in 1964. He worked as a professor of computer science at the University of California, Berkeley until 2000. He is currently the Bruce Nelson Professor of Computer Science at Carnegie Mellon University.

Wednesday, January 17, 2007

Code Obfuscator

One of the key architectural features of Microsoft .NET is that all its languages compile to .NET assemblies containing CPU-independent instructions. These instructions, known as Microsoft Intermediate Language (MSIL), contrast with many other languages that generate CPU-specific instructions for the target CPU. .NET assemblies are comprised both of MSIL instructions and also metadata that describe types, members, and code references from other assemblies. At runtime, the MSIL instructions are converted to CPU-specific instructions by a just-in-time (JIT) compiler.

The use of this architecture has several huge benefits for the .NET developer. For instance, it makes possible easy interoperability for code written in differing languages, and it makes it easy for an assembly to specify exactly the version of another assembly with which it will work. But there is one major drawback for some developers as well: the MSIL and metadata in an assembly provide enough information to recover the original code. The .NET framework ships with a tool, ILDASM that can disassemble MSIL into assembly language and other utilities can carry the process even further, translating a .NET assembly back into a high-level language such as C# or Visual Basic .NET.

This is a drawback because it is very difficult to protect the intellectual property in an application if anyone can read the source code for the application. Developers, who have spent months or years coming up with complex algorithms or workarounds for bugs in the .NET Framework, or other components, often prefer to have their methods remain secret from their competitors.

This is where obfuscators come in. The purpose of an obfuscator is to apply one or more transformations to a .NET assembly without affecting the proper functioning of the assembly, but that make it difficult or impossible to understand any source code recovered from the assembly. As a simple example, every obfuscator offers some level of member renaming. Source code where all of the objects are named things like A, B, and C instead of Customer, ClientData, and ServiceUpdate, is substantially more difficult to understand.

How to obfuscate?
Simplest way – Use Macro preprocessors to create hard to read code by masking the standard language syntax and grammar from the main body of code.

Pitfalls:
Although obfuscation is a valuable technology in many circumstances, you do need to be aware of some potential pitfalls:
1. Obfuscation can break code that depends on reflection, serialization, or remoting.
2. Obfuscation can make diagnosing and debugging problems in your code more difficult.
3. Obfuscation adds another step and another potential error source to your build process.

Obfuscation in Visual Studio .NET:
If your obfuscation needs are minimal, and you're a Visual Studio .NET user you may not need to purchase a product at all. That's because Visual Studio .NET includes a copy of the Community Edition of Dotfuscator for .NET, an obfuscator from PreEmptive Solutions. This obfuscator is targeted at students and freeware authors, and supports basic entity renaming and removal of unused metadata, but no advanced obfuscation features.

Recreational obfuscation:
Code is sometimes obfuscated deliberately for recreational purposes. There are programming contests which reward the most creatively obfuscated code: the International Obfuscated C Code Contest, Obfuscated Perl Contest, International Obfuscated Ruby Code Contest and Obfuscated PostScript Contest.
There are many varieties of interesting obfuscations ranging from simple keyword substitution, use/non-use of whitespace to create artistic effects, clever self-generating or heavily compressed programs, or programs that are valid and operate similarly in multiple programming languages.

Points to note:
No obfuscator known today provides any guarantees on the difficulty of reverse engineering: See http://www.math.ias.edu/~boaz/Papers/obf_informal.html
Hackers can use this technique to foil anti-virus programs that rely upon a virus having a static signature. The technique also allows spammers to hide their intentions. See: http://www.itbusiness.ca/it/client/en/Home/News.asp?id=41807&bSearch=True

Tuesday, January 09, 2007

ClickOnce Deployment

Visual Studio provides two different strategies for deploying Windows-based applications: publishing an application using ClickOnce technology, or deploying it with a traditional Setup using Windows Installer technology. With ClickOnce deployment, you publish the application to a centralized location and the user installs or runs the application from that location. With Windows Installer deployment, you package the application in a setup.exe file and distribute that file to users, who run the Setup.exe file to install the application.

ClickOnce deployment greatly simplifies the process of installing and updating an application, but you do not get the power of Windows Installer deployment, which provides you with greater flexibility.

ClickOnce deployed applications are self-updating and are the best choice for applications that require frequent changes. Although ClickOnce applications can be initially installed by means of a CD-ROM, users must have network connectivity to take advantage of the update capabilities.


With Windows Installer, you add a Setup project to your solution to create a setup file that is distributed to users; the user runs the setup file and steps through a wizard to install the application.


Creating ClickOnce application:

  1. Create normal windows or console application.
  2. Then, right click on it and click on Publish.
  3. This will prompt you to specify a Web site or network file share.
  4. The user installs and launches the application directly from that location in a single step.


However, you don't have flexibility of custom install in ClickOnce application. This is crucial feature that I miss the most. However, updating functionality is great.


If your application does not require custom install actions and if you have hosting space, I will advise you to try this feature.

Monday, January 08, 2007

Some tips about creating an installer

I am working on custom installer for deploying DTSX packages. It also includes few custom actions like creating databases, populating data, attaching dtsConfig files, etc...

Few Golden rules:
1. “Uninstall/Remove/Delete all resources during uninstall and rollback, you install using custom action”.
2. Suppress all exceptions in Rollback.
3. Validate “ALL” parameters supplied by user.
4. Create a separate file that stores all strings used by Installer for user interaction.
5. Always have a log file.
6. Not recommended: You can use MessageBox to debug your installer.

How to implement:
1. Uninstall:
Create a text file “uninstall.log”. This should ideally have list of all resources that has been installed using custom install method. It should also have additional parameters (eg username, password) that will be useful for uninstalling or removing the resources.
Rollback:
Create a flag that states whether a resources has been deployed or not. Add all parameters required in state server:
Eg: stateSaver.Add("someParameter", ParameterValue); -- in Install method
if(savedState["someParameter"] != null) // Required
string someParameterInRollback = savedState["someParameter"].ToString(); -- in Rollback method

2. Keep a general try / catch loop. This will allow you to do maximum damage control J in rollback phase.

public override void Rollback(System.Collections.IDictionary savedState)
{
base.Rollback(savedState);
try
{
//
//Rollback Action
//
}
catch (Exception exception)
{
//Suppress all exceptions
}
}

3. Keep a separate method for this functionality say “private void ValidateUserParameters(System.Collections.IDictionary stateSaver)”. It should check:
a. Whether parameter is entered or not -Context.Parameters.ContainsKey("SOMEPARAMETER")
b. Whether entered parameter is null or empty - String.IsNullOrEmpty(Context.Parameters["SOMEPARAMETER"])
c. Whether entered parameter is in valid range
Always report “all” invalid parameters. i.e. have some flag that signifies invalid parameters and an errorMessage string that you can append when you get an invalid parameters and display at end.

4. This will helpful in change management. Eg:
public const string InstallerUsage = "Usage: msiexec /i UrMSI.msi SOMEPARAMETER=\"Some Paramter Name\" ";
5. Log file will help user know whether Installer was successful or not. Also, it can help him trace the action done by installer. You can have a text log file or you can log it in application log (use System.Diagnostics.EventLog.WriteEntry()).

6. Add reference to "System.Window.Forms” and you can use all windows based forms. But remember to remove the reference while Release.

SqlConnection, SqlCommand -- using them efficiently

SqlConnection and SqlCommand implement IDisposable, which means they could have unmanaged resources to cleanup and it is our job, the developers, to make sure Dispose() gets called on these classes after we are finished with them.

To avoid all mess, use following C# syntax :

using (SqlConnection sqlConnection = new SqlConnection(connectionString))
{
using (SqlCommand sqlCommand = new SqlCommand(commandString, sqlConnection))
{
sqlConnection.Open();
sqlCommand.ExecuteNonQuery();
}
}

This is essentially equivalent to :

SqlConnection sqlConnection = null;
SqlCommand sqlCommand = null;
try
{
sqlConnection = new SqlConnection(connectionString);
sqlCommand = new SqlCommand(commandString, sqlConnection);
sqlConnection.Open();
sqlCommand.ExecuteNonQuery();
}
finally
{
if (null != sqlCommand);
sqlCommand.Dispose();
if (null != sqlConnection)
sqlConnection.Dispose();
}

If you don't use above code and if you forget to close connection, you cannot drop a database. It will throw an error saying that database is still in use.

However, there is a way out (though not advisable) for dropping the database:

USE MASTER
GO
ALTER DATABASE DBNAME
SET SINGLE_USER WITH ROLLBACK IMMEDIATE
DROP DATABASE DBNAME

Note: above code is used to force database to delete all connection (which might cause loss of data in multi-user environment).