Wednesday, September 19, 2007

Mixing DOS and Unix

I wrote a java program using Notepad in Windows and when I opened it in vi editor on unix machine, I got ^M character at end of each line.

Eg:
import java.io.*;^M
^M
/**^M
*^M
*This class is used to manage the o/p stream of the application.^M
* ^M
*@author Niketan Pansare^M
*@version 1.0 ^M
* ^M
*/^M
public class OutputManager^M
{^M
OutputStream outputStream; ^M
^M
/**^M
* This constructor uses standard i/o as default stream.^M
*/^M


and so on ...

You can replace all the extra ^M in vi editor by:
1. Go in command mode (Press Esc - If you already are in command mode, you will hear a beep)
2. Then type
:%s/^M$//g
Don't copy and paste above lines. To add ^M, press (CTRL+V) + (CTRL+M) ie ^V+^M

What does above command mean:
For substitution you use following command
:[range]s/[pattern]/[string]/[options]

's' here mean substitute a pattern with a string.

range can be
{number} an absolute line number
. the current line
$ the last line in the file
% equal to 1,$ (the entire file)

You can also use # instead of / as seperator.

Technically you can define pattern as:
The definition of a pattern: *search_pattern*

Patterns may contain special characters, depending on the setting of the
'magic' option.

*/bar* */\bar*
1. A pattern is one or more branches, separated by "\|". It matches anything
that matches one of the branches. Example: "foo\|beep" matches "foo" and
"beep".

2. A branch is one or more pieces, concatenated. It matches a match for the
first, followed by a match for the second, etc. Example: "foo[0-9]beep",
first match "foo", then a digit and then "beep".

3. A piece is an atom, possibly followed by:
magic nomagic
*/star* */\star*
* \* matches 0 or more of the preceding atom
*/\+*
\+ \+ matches 1 or more of the preceding atom {not in Vi}
*/\=*
\= \= matches 0 or 1 of the preceding atom {not in Vi}

Examples:
.* .\* matches anything, also empty string
^.\+$ ^.\+$ matches any non-empty line
foo\= foo\= matches "fo" and "foo"


4. An atom can be:
- One of these five:
magic nomagic
^ ^ at beginning of pattern, matches start of line */^*
$ $ at end of pattern or in front of "\|", */$*
matches end of line
. \. matches any single character */.* */\.*
\< \<> \> matches the end of a word */\>*
\i \i matches any identifier character (see */\i*
'isident' option) {not in Vi}
\I \I like "\i", but excluding digits {not in Vi} */\I*
\k \k matches any keyword character (see */\k*
'iskeyword' option) {not in Vi}
\K \K like "\k", but excluding digits {not in Vi} */\K*
\f \f matches any file name character (see */\f*
'isfname' option) {not in Vi}
\F \F like "\f", but excluding digits {not in Vi} */\F*
\p \p matches any printable character (see */\p*
'isprint' option) {not in Vi}
\P \P like "\p", but excluding digits {not in Vi} */\P*
\e \e */\e*
\t \t */\t*
\r \r */\r*
\b \b */\b*
~ \~ matches the last given substitute string */~* */\~*
\(\) \(\) A pattern enclosed by escaped parentheses */\(\)*
(e.g., "\(^a\)") matches that pattern
x x A single character, with no special meaning,
matches itself
\x \x A backslash followed by a single character, */\*
with no special meaning, matches the single
character
[] \[] A range. This is a sequence of characters */[]*
enclosed in "[]" or "\[]". It matches any */\[]*
single character from the sequence. If the
sequence begins with "^", it matches any
single character NOT in the sequence. If two
characters in the sequence are separated by '-', this
is shorthand for the full list of ASCII characters
between them. E.g., "[0-9]" matches any decimal
digit. To include a literal "]" in the sequence, make
it the first character (following a possible "^").
E.g., "[]xyz]" or "[^]xyz]". To include a literal
'-', make it the first or last character.

If the 'ignorecase' option is on, the case of letters is ignored.

It is impossible to have a pattern that contains a line break.

Examples:
^beep( Probably the start of the C function "beep".

[a-zA-Z]$ Any alphabetic character at the end of a line.

\<\I\i or \(^\|[^a-zA-Z0-9_]\)[a-zA-Z_]\+[a-zA-Z0-9_]* A C identifier (will stop in front of it). \(\.$\|\. \) A period followed by end-of-line or a space. Note that "\(\. \|\.$\)" does not do the same, because '$' is not end-of-line in front of '\)'. This was done to remain Vi-compatible. [.!?][])"']*\($\|[ ]\) A search pattern that finds the end of a sentence, with almost the same definition as the ")" command. Technical detail: characters in the file are stored as in memory. In the display they are shown as "^@". The translation is done when reading and writing files. To match a with a search pattern you can just enter CTRL-@ or "CTRL-V 000". This is probably just what you expect. Internally the character is replaced with a in the search pattern. What is unusual is that typing CTRL-V CTRL-J also inserts a , thus also searches for a in the file. {Vi cannot handle characters in the file at all}


Seems complex ??
Let us stick to our example for time being.

$ mean end of line
Therefore, ^M$ mean any ^M character at end of line.
It is supposed to be replaced by nothing (since in our example, string is empty).

g at end mean "global"