AWK practical tips for parsing access logs

#1 Find top 10 IPs from an access log

Now you can pipe the output to a pipe and can find top 10 IPs using sort and head -10

#2 Suppose you have an access log which have timestamp in epoch seconds like following and you want to print hourly QPS count

Now you know how to get an hourly data and you know how to pass bash variable inside awk. So using a for loop, right you can get data for all hours separately

Notes:

  • I tested this in awk version 3.1.7. You can find your awk version using awk –version
  • Assuming log is getting rotated every day

Click me for the official page of AWK

ctime, mtime and atime – the Linux timestamps

Even though the timestamps are filesystems specific implementation, following are the main timestamps which all Linux filesystems have.

  • ctime – The ctime (change time) is the time when changes made to the file’s inode (owner, permissions, etc.). The ctime is also updated when the contents of a file change. You can view the ctime with the ls -lc command
  • atime – The atime (access time) is the time when the data of a file was last accessed. Displaying the contents of a file or executing a shell script will update a file’s atime, for example. You can view the atime with the ls -lu command
  • mtime – The mtime (modify time) is the time when the actual contents of a file was last modified. This is the time displayed in a long directory listing (ls -l)

For more clarity on timestamps:

Following are the system calls for retrieving information about a file

  • stat()
  • lstat()
  • fstat()

These system calls differ only in the way that file is passed. stat() returns information about the named file. lstat() is also doing the same but if the named file is a link, the information about the link itself will return rather than the file to which the link points. fstat() returns information about a file referred to by an open file descriptor.

The ext4 filesystem have implemented few more timestamps which are following:

  • dtimedeletion time
  • crtimecreation time

You can read more about ext4 timestamps in following link:

Difference between $* and $@ in bash?

$* and $@, both these bash special variables expands to the positional parameters, starting from the first one.

These variables are same (expand positional parameters in same way) when using without double quotes. If these variables are using inside double quotes, it will expand positional parameters differently.

$* within double quotes ("$*") is equivalent to the list of positional parameters, separated by IFS variable.

Suppose IFS is ":" and hence expansion of "$*" will be like "$1:$2:$3:…"

And $@ within a pair of double quotes ("$@") is equivalent to the list of positional parameters separated by unquoted spaces, i.e., "$1" "$2".."$N". Or in other words, it is equivalent to the list of positional parameters where each parameters are double quoted.

For sake better understanding I wrote a script named star_and_at.sh and pushed to my public github repo

You can clone my bash github public repository directly using following command

Find out what all filesystems are supported by linux kernel

/proc is very special in that it is also a virtual filesystem. It’s sometimes referred to as a process information pseudo-file system. In /proc there is a file named filesystems. And as the name says, it contains what all filesystems that are supported by the running kernel.

mount: could not find any free loop device

One day I got 35 iso images and I created a script for mounting it all. But after 8 loop device mounting I can’t do any more. I was keep getting the error mount: could not find any free loop device
So I decided to increase the number of loop devices in the system.
Below are the steps which I did for increasing loop devices on my SUSE_11.4-x86_64 box

One day I got 35 iso images and I created a script for mounting it all. But after 8 loop device mounting I can’t do any more. I was keep getting the error mount: could not find any free loop device

So I decided to increase the number of loop devices in the system.

Below are the steps which I did for increasing loop devices on my SUSE_11.4-x86_64 box

1. Created a file named loop.conf in /etc/modprobe.d/

2. Added following entry in loop.conf

3. Reloaded the loop module

Import csv file to a mysql table

Using the LOAD DATA INFILE SQL statement we can import data to a mysql table. Suppose I have a table named from_csv in the database test_csv.

Using the LOAD DATA INFILE SQL statement we can import data to a mysql table.

Suppose I have a table named from_csv in the database test_csv.

And I have csv file with following content

The following query will import this csv file to mysql table from_csv

After the execution of the query the contents of the table: is:

Compile apache with suphp on Ubuntu

One day my client asked me to setup a webserver on a control panel less Ubuntu server. He needs as php as suphp. Following are the steps which I followed during the web server setup.

One day my client asked me to setup a webserver on a control panel less Ubuntu server. He needs as php as suphp.

Following are the steps which I followed during the web server setup.

Downloaded all neccessary sources:

1. First compile apache from the source with following configure options
Continue reading “Compile apache with suphp on Ubuntu”

Match a pattern and delete n lines after it

Suppose you have a file named file.txt with following content. Delete all 2 lines [including pattern itself]after the pattern match “Class”.

I got this trick from SERVERFAULT.COM 🙂

Situation:

Suppose you have a file named file.txt with following content
Continue reading “Match a pattern and delete n lines after it”

bash: /bin/rm: Argument list too long

bash: /bin/rm: Argument list too long. If you’re trying to delete files inside a directory and the following command is not working:

If you’re trying to delete files inside a directory and the following command is not working:

In this case you can delete all files using find with appropriate switches:

If you want to delete files in verbose mode:

rsync – a fast, versatile, remote (and local) file-copying tool

Suppose you have a load of files and folders which don’t want to backup. In this case rsync have option, define all the files and folders which want to exclude in a single file and make rsync to read from it.
Copying a file from one location to another(under same file system hierarchy)
Copying a file from one machine another remote machine
Copying all files except one
Delete files from the destination folder that are no longer required (i.e. they have been deleted from the folder being backed up)

Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally, to/from another host over any remote shell, or to/from a remote rsync daemon. It offers a large number of options that control every aspect of its behavior and permit very flexible specification of the set of files to be copied. It is famous for its delta-transfer algorithm, which reduces the amount of data sent over the network by sending only the differences between the source files and the existing files in the destination. Rsync is widely used for backups and mirroring and as an improved copy command for everyday use.

Some of the additional features of rsync are:

  • support for copying links, devices, owners, groups, and permissions.
  • exclude and exclude-from options.
  • can use any transparent remote shell, including ssh or rsh.
  • does not require super-user privileges.
  • support for anonymous or authenticated rsync daemons (ideal for mirroring).

I think it is better to study rsync command through examples.
Continue reading “rsync – a fast, versatile, remote (and local) file-copying tool”