One of the top things I get “help me joda!” emails about is people trying to run my utilities and running into issues with strings with spaces and other special characters. These strings are handled in special ways by the command interpreter and when people think my programs are screwing up, it is actually just that my program is not seeing what the person running the program is intending…
What do I mean by this… The command interpreter CMD.EXE is just a program, it takes the things you type and processes it. There are special characters that mean special things to CMD. For example, the space character that you get by pressing that wide button on the bottom of your keyboard tells CMD that you are ending the last word (or more accurately token) and starting a new one. This is generally used to separate an executable name from parameters or parameters from each other.
For example
md
is a single token md.
md joerocks
is two tokens. md and joerocks where joerocks is the parameter to be “passed” or “handed over” to md to work with. They are separated by the space. You look at that and you know what it means, everyone[0] knows what it means of course. Create a directory called joerocks.
Well the computer doesn’t just know ANYTHING, it KNOWS by following rules programmed into it that it has to follow to a logical (generally) conclusion…
So what if you do…
md joe rocks
That is three tokens – md followed by joe followed by rocks. The MD program[1] will create a directory called joe and then it will create a directory called rocks. Let’s see this in action:
G:\blogfodder>dir
Volume in drive G is GDrive
Volume Serial Number is 5C71-92C4Directory of G:\blogfodder
03/26/2008 06:26 PM <DIR> .
03/26/2008 06:26 PM <DIR> ..
0 File(s) 0 bytes
2 Dir(s) 303,191,179,264 bytes free
Now you run the command
md joerocks
and you get
G:\blogfodder>dir
Volume in drive G is GDrive
Volume Serial Number is 5C71-92C4Directory of G:\blogfodder
03/26/2008 06:27 PM <DIR> .
03/26/2008 06:27 PM <DIR> ..
03/26/2008 06:27 PM <DIR> joerocks
0 File(s) 0 bytes
3 Dir(s) 303,191,179,264 bytes free
but if instead you run
md joe rocks
you get
[Wed 03/26/2008 18:28:48.88]
G:\blogfodder>dir
Volume in drive G is GDrive
Volume Serial Number is 5C71-92C4Directory of G:\blogfodder
03/26/2008 06:28 PM <DIR> .
03/26/2008 06:28 PM <DIR> ..
03/26/2008 06:28 PM <DIR> joe
03/26/2008 06:28 PM <DIR> rocks
0 File(s) 0 bytes
4 Dir(s) 303,191,179,264 bytes free
Do you see that? Does it make sense?
Oh my gosh… what if you really want a directory called joe rocks though? Personally I recommend against directories with spaces in them for the token parsing reason[2] but lets say you just want to do it because it looks prettier… Damn the functionality and issues… How do you do it… Most command line folks or even folks that occasionally use the command line will immediately say… “quote the string” of course! And yes that is it, you “quote the string” with the spaces (and/or other special characters) like so
md “joe rocks”
which gives you something like
[Wed 03/26/2008 18:36:05.00]
G:\blogfodder>dir
Volume in drive G is GDrive
Volume Serial Number is 5C71-92C4Directory of G:\blogfodder
03/26/2008 06:36 PM <DIR> .
03/26/2008 06:36 PM <DIR> ..
03/26/2008 06:36 PM <DIR> joe rocks
0 File(s) 0 bytes
3 Dir(s) 303,191,179,264 bytes free
What the quotes did is to make the command interpreter treat the entire string joe rocks as a single token.
Fabulous!
Now let’s say you want to see if there is anything in that new directory (or folder if you prefer you Windows Explorer types…)
you type
dir joe rocks
and you get…
G:\blogfodder>dir joe rocks
Volume in drive G is GDrive
Volume Serial Number is 5C71-92C4Directory of G:\blogfodder
Directory of G:\blogfodder
File Not Found
Err…. whoops. Have to quote that too…
G:\blogfodder>dir “joe rocks”
Volume in drive G is GDrive
Volume Serial Number is 5C71-92C4Directory of G:\blogfodder\joe rocks
03/26/2008 06:36 PM <DIR> .
03/26/2008 06:36 PM <DIR> ..
0 File(s) 0 bytes
2 Dir(s) 303,191,179,264 bytes free
Fabulous! Again, the quotes did what? They tell the command interpreter to treat the string joe rocks as a single token.
But what does that really do for the program? Why does that help? Why can’t the program just figure this schtuff out, I want what I want, I don’t want to guess what the program thinks I might want and then try to guess how to tell it what I want…
Ok so for a bit of a techy bit on the programming side. If you are a limey git living in Southern Florida with a poorly faked English accent who commutes to New York on a regular basis, you probably want to skip this bit because you will get confused, the rest of you, yes including you mom, can continue reading…
When you launch a program from the command line, the parameters are passed to the program. Depending on the some specific details about the program the parameters are presented in various ways to the program code. I write in c or c++ so use the standard argc / argv mechanism (see main function – http://en.wikipedia.org/wiki/Main_function_%28programming%29) for getting the parameters. Basically this means you get passed two variables, argc which is the count of parameters passed in and argv is an array or listing of those parameters. So say you have the following command
programname1 arg1 arg2 arg3 arg4
The argc variable will have a value of 5 and the argv array will look like
- programname1.exe
- arg1
- arg2
- arg3
- arg4
The first value will be the name of the executable itself, and then an array entry for every parameter passed.
So say you call adfind with
adfind -default -f name=joe -dn
The argv array will look like
- adfind.exe
- -default
- -f
- name=joe
- -dn
My parameter handling routine knows that the -f switch will usually have a modifier so it will look at the array entry following the entry with the switch to get that info. As you can see here, that info is there for the program to use. But what happens if the name has a space in it? Say like:
adfind -default -f name=joe rocks -dn
The argv array will then look like
- adfind.exe
- -default
- -f
- name=joe
- rocks
- -dn
So when AdFind sends the filter to the server, it will send name=joe and not name=joe rocks. Obviously that is incorrect. To correct this issue, you quote the string that has the special character…
adfind -default -f “name=joe rocks” -dn
The argv array will then look like
- adfind.exe
- -default
- -f
- name=joe rocks
- -dn
Much better.
So as you can see, the quotes can make a huge difference in what my or anyone else’s programs see and consequently how they respond.
DEAN YOU CAN START READING AGAIN…
Now the reason I started writing this blog entry though was because of an email that basically said:
Hey poser, this command isn’t working… why?
cpau -u mydomain\myuserid -p mypassword -ex “msiexec /i “c:\temp \pearl echo 7.0 workstation.msi” /q” -lwop
Let me show you what the argv array looks like for this command:
- cpau.exe
- -u
- mydomain\myuserid
- -p
- mypassword
- -ex
- msiexec /i c:\temp
- \pearl
- echo
- 7.0
- workstation.msi /q
- -lwop
Do you think that is what was intended? CPAU is going to look at that and think…
Ok so the userid I am supposed to use is mydomain\myuserid
The password is mypassword
The thing to run is msiexec /i c:\temp
The logon locally without a profile option is enabled
And finally there are some random items in the array that may or may not be worth something.
The problem is due to the quotes. Quotes go together – start and end. They don’t just automatically “nest” for you. Meaning if you saw something like “joe said that quotes don’t “nest” properly and you need to do something about it” you could probably read it and ascertain the meaning. You would likely realize that the quote characters around the word nest weren’t the end of the first quoted item seen in the string in front of joe and the start of a new quoted string that ends after it. The computer isn’t that intelligent, remember computers run on 0’s and 1’s, on or off, true or false (or if you prefer TRUE or FALSE)…. There is no “well I think it might be this”. It needs to follow specific rules and the rule with quotes is if you see a quote, keep everything after in a single token until you encounter another quote and then you can stop doing that and go back to normal processing. So going back to that string above
“msiexec /i “c:\temp \pearl echo 7.0 workstation.msi” /q”
it is looked at as the following strings
“msiexec /i “c:\temp \pearl echo 7.0 workstation.msi” /q”
You may wonder, wait shouldn’t the first token end after the first “end quote” so c:\temp would be a new string… Nope, the command interpreter splits up the command line into tokens based on the space character (primarily). Since there is no space after the quote character, it keeps on going until it hits one. Ditto for the end of the string.
The proper way to handle nesting quotes is to use the “slash” character to “escape” the quote character or in other words to pass it through the command interpreter (i.e. make CMD ignore it). Like so
cpau -u mydomain\myuserid -p mypassword -ex “msiexec /i \”c:\temp \pearl echo 7.0 workstation.msi\” /q” -lwop
The extra slash characters will cause argv to be constructed as
- cpau.exe
- -u
- mydomain\myuserid
- -p
- mypassword
- -ex
- msiexec /i “c:\temp \pearl echo 7.0 workstation.msi” /q
- -lwop
Note that the quote characters that were escaped were passed right through to the executable, which is this case is EXACTLY what we wanted.
Hope this helps out. 🙂
joe
[0] I am using everyone in an incredibly relativistic[0.1] way here…
[0.1] Ok, I don’t mean as in Theory of Relativity way either. Don’t be pedantic.
[1] Yes I know it is part of CMD and not its own program, bear with me here…
[2] I have always had a special love for Microsoft and Program Files or Documents And Settings or should I say PROGRA~1 and DOCUM~1… Wankers.
Who are you and why are you stealing my country’s insults?
Another country heard from…
Using a forward slash to delimit characters is all well and good if you know that sensei joeda has a C mindset.
In the rest of the Windows world, you’d want to work around what the CMD interpreter or the Windows Shell will delimit, by using a quote or a percent character with a caret.
sample.exe This parameter list includes ^”embedded^” quote characters in the 5th argument.
Poor, downtrodden admins such as myself have to learn both methods, because otherwise, we’d be all confusticated when we tell ADFind to delimit with a tab, by telling it to use a \t character.
Andrew from Vancouver
Ah yes, the \t is a whole other thing, that isn’t about the command interpreter, that is about how adfind accepts that value as a parameter and that is definitely c/perl’ish.
The caret does weird things on the escape… I don’t like it… In the example given for instance, while the second quote would not be used to terminate the first quote (i.e. it would nest) the quote characters won’t be passed into the exe which in that example would be bad because the quotes are needed for the EXE that cpau was passing parameters too.
Great post, Joe. Any UNIX user will know this stuff intimately, but I’m glad you’re covering it here. I’ll put up a post myself and link here … I’ve had the same sort of questions and I like your use of argc and argv to explain.
Fred: No problem, glad to help. 🙂