Unix commands: The best tool for the job
You can't live very long with a general purpose handyman without hearing how important it is to have the proper tool for nearly any task you're about to undertake. You don't use a knife when you need a saw and you don't use a hammer when you need a punch.
Well, I've written far too many scripts that have used awkward syntax when an easier way was available. Here are a some examples of when the right command is clearly the right choice for the task at hand.
If you need to count how many times a particular string appears in a file, your gut impulse might be to do something like this:
boson> grep "my string" myfile | wc -l
37
This works very well, but you can use a simpler syntax and maybe even save yourself some precious milliseconds. If you use the -c argument with grep, you can get grep to do the counting for you:
boson> grep -c "my string" myfile 37
The "grep -c" might even save a little time on particularly large files, though for smaller files, the times are likely to be about the same.
boson> time grep -c logo.gif access_log
6528
real 0m0.592s
user 0m0.370s
sys 0m0.140s
boson> time grep logo.gif access_log | wc -l
6528
real 0m0.639s
user 0m0.440s
sys 0m0.190s
There are some particularly nice advantages when you're looking through a number of files for your string. For example, notice
boson> # grep -c "file system full" messages* messages:848 messages.1:155 messages.2:7
Now that's handy. We see that we've been getting "file system full" messages over the span of several messages files and how many times the messages have appeared in each. Try doing that with a pipe to wc -l. It's a lot more trouble.
boson> grep "file system full" messages* | wc -l
1010
No, that's not right.
boson> for file in `ls mess*`
> do
> grep "file system full" $file | wc -l
> done
848
155
7
That's closer, but not quite right.
boson> for file in `ls mess*` > do > count=`grep "file system full" $file | wc -l` > echo $file: $count > done messages: 848 messages.1: 155 messages.2: 7
That's right, but that's a lot of trouble. The grep -c approach makes a lot more sense for counting lines across a number of files.
The similar -c option for the uniq command gets my vote as the right tool on many occasions. Any time I need to figure out how many uniq values I have in a list and how many of each, I do something like this:
awk '{print $NF}' access_log | sort -n | uniq -c
35707 -
6 2
6991 43
400 46
6113 49
11176 52
51 62
18056 64
17512 66
25 67
10 85
389 103
391 125
1 177
1 201
7 225
...
Here, we're looking at the sizes of objects (files, etc.) returned by a web server to its clients. If we only want to know how many of these records end with a hyphen, we could use grep -c like this:
# grep -c -- "-$" access_log 35707
The "--" in that command keeps the shell from choking on the hyphen in the search string. The $ matches the end of the line.
Build your tech library with our book giveaways.
Hacking Exposed, Sixth Edition
By Stuart McClure, Joel Scambray, George Kurtz; Published by McGraw-Hill/Osborne
The original Hacking Exposed authors rejoin forces on this tenth anniversary edition to offer completely up-to-date coverage of today's most devastating hacks and how to prevent them. Using their proven methodology, the authors reveal how to locate and patch system vulnerabilities. The book includes new coverage of ISO images, wireless and RFID attacks, Web 2.0 vulnerabilities, anonymous hacking tools, Ubuntu, Windows Server 2008, mobile devices, and more. Enter now!









The Best Tool For The Job
In your example you state:> If you need to count how many times a particular string
> appears in a file . . .
but actually neither of your examples answers the question
exactly as lines can contain multiple occurrences of the
same string. Your example only shows the number of lines
that contain the string, not the number of strings. This
is a much more complex problem. I have not found a simple
command to do this, but I have written a small script to
do so. It is not very elegant, but here it is:
---
#!/bin/ksh
integer count=0
echo $count
count=`grep -c "$1" "$2"`
cp $2 $2.$$
while
grep "$1" $2.$$ > /dev/null 2>&1
do
ed - $2.$$ <<-EOF
g/$1/s///
w
q
EOF
count=$count+`grep -c "$1" "$2.$$"`
done
rm $2.$$
echo $count
---
This works on my SUN Solaris 10 system.