Friday, July 27, 2007

Using awk to analyze logs

AWK is a neat tool that comes in most *nix OS environments, basically on this small article I show you how to use awk to analyze a log file like an access_log from apache or even a sendmail log file. This can be really usefull to analyze big log files and find in some cases, someone hammering your server with hyge traffic (DoS Attack)
What is AWK ?

awk Command

Definition: awk is a powful Unix command. It allows the user to manipulate files that are structured as columns of data and strings.

Once you understand the basics of awk you will find that it is surprisingly useful. You can use it to automate things in ways you have never thought about. It can be used for data processing and for automating the application of Unix commands. It also has many spreadsheet-type functionalities.

There are two ways to run awk. A simple awk command can be run from the command line. More complex tasks should be written as awk programs ("scripts") to a file. Examples of each are provided below.

Example: % awk 'pattern {action}' input-file > output-file

meaning: take each line of the input file; if the line contains the pattern apply the action to the line and write the resulting line to the output-file.

If the pattern is omitted, the action is applied to all lines:

% awk '{action}' input-file > output-file

By default, awk works on files that have columns of numbers or strings that are separated by white space (tabs or spaces), but the -F option can be used if the columns are separated by another character. awk refers to the first column as $1, the second column as $2, etc. The whole line referred to as $0.

Back to my example

Lets say we want to find the amount of times an specific ip address has hit your webserver,

on this example we are assuming your apache access_log is located in /usr/local/apache/logs

The full command would be:

awk '{print $1}'

This small command is really powerfull, give it a try!

No comments: