AWK practical tips for parsing access logs

#1 Find top 10 IPs from an access log

awk '{ array[$2]++ } END {for (ip in array) print array[ip],ip}' access.log

Now you can pipe the output to a pipe and can find top 10 IPs using sort and head -10, right?

#2 Suppose you have an access log which have timestamp in epoch seconds like following and you want to print hourly QPS count

[~]$ cat access.log
127.0.0.1 1409741581 "GET / HTTP/1.0" 200 1634 "-" "Mozilla/4.0..."
151.2XX.2X.136 1409745842 "GET /static/icon.png HTTP/1.0" 304 0 "-" "Web..."
4.XX.5X.78 1409751488 "GET / HTTP/1.0" 200 1638 "-" "Mozilla/5.0..."
XX.0X.42.1X6 1409752561 "GET / HTTP/1.0" 200 1636 "-" "Mozilla/5.0..."
10.0.0.2 1409758082 "GET /static/icon.png HTTP/1.0" 304 0 "-" "Web..."
10.0.0.2 1409758860 "GET / HTTP/1.0" 200 1636 "-" "Mozilla/5.0..."
127.0.0.1 1409765527 "GET /static/icon.png HTTP/1.0" 304 0 "-" "Mozilla/4.0..."
XX.X03.X2.X36 1409765942 "GET / HTTP/1.0" 200 1628 "-" "Mozilla/5.0..."
127.0.0.1 1409766000 "GET /static/icon.png HTTP/1.0" 304 0 "-" "Mozilla/4.0..."

[~]$ HOUR=13; awk '{ if (strftime("%H",$2,1) == "'"$HOUR"'") print $1 }' access.log

Now you know how to get an hourly data and you know how to pass bash variable inside awk. So you can figure out how to use this in a for loop, right?

Notes:

  • I tested this in awk version 3.1.7. You can find your awk version using awk –version
  • Assuming log is getting rotated every day

Click me for the official page of AWK

Read More

ctime, mtime and atime – the Linux timestamps

Even though the timestamps are filesystems specific implementation, following are the main timestamps which all Linux filesystems have.

  • ctime – The ctime (change time) is the time when changes made to the file’s inode (owner, permissions, etc.). The ctime is also updated when the contents of a file change. You can view the ctime with the ls -lc command
  • atime – The atime (access time) is the time when the data of a file was last accessed. Displaying the contents of a file or executing a shell script will update a file’s atime, for example. You can view the atime with the ls -lu command
  • mtime – The mtime (modify time) is the time when the actual contents of a file was last modified. This is the time displayed in a long directory listing (ls -l)

For more clarity on timestamps:

cat file # file's atime is updated
chmod 755 file # file's ctime is updated
echo "new contents" >> file # file's ctime and mtime are updated
vi file # if you add/delete some lines ctime and mtime will get updated

Following are the system calls for retrieving information about a file

  • stat()
  • lstat()
  • fstat()

These system calls differ only in the way that file is passed. stat() returns information about the named file. lstat() is also doing the same but if the named file is a link, the information about the link itself will return rather than the file to which the link points. fstat() returns information about a file referred to by an open file descriptor.

The ext4 filesystem have implemented few more timestamps which are following:

  • dtimedeletion time
  • crtimecreation time

You can read more about ext4 timestamps in following link:

Read More

How to make google drive as your automated backup location using python

Before you going to coming paragraphs of this article, I recommend you to watch following youtube video so you will get what is going to describe here:


Read following links to learn more about the authentication mechanism google API is using :
https://developers.google.com/accounts/docs/OAuth2
https://developers.google.com/api-client-library/python/guide/aaa_oauth

As Claudio (GOOGLE) mentioned in the above video, (http://youtu.be/zJVCKvXtHtE?t=12m18s), we need to use some logic (client library) to sort your crendentials for reusing. Other wise on every execution of ‘quickstart.py’ script, you need human interaction for getting tokens. I added 6 lines to google’s ‘quickstart.py’ to keep authorization code for reusing. So as long as the user has not revoked the access granted initially to the application, you don’t need a human interaction.

For more details : https://developers.google.com/accounts/docs/OAuth2InstalledApp#refresh

You can get modified google’s quickstart.py using following

git clone git@github.com:sukujgrg/google_drive.git

or

Goto : https://github.com/sukujgrg/google_drive

So, you got the idea how to deal with Google API and authentication to Google drive without human interaction. Now, you just need to apply your modifications to the script, suitable to your environment and then just put a cron job.

Read More

Difference between $* and $@ in bash?

$* and $@, both these bash special variables expands to the positional parameters, starting from the first one.

These variables are same (expand positional parameters in same way) when using without double quotes. If these variables are using inside double quotes, it will expand positional parameters differently.

$* within double quotes ("$*") is equivalent to the list of positional parameters, separated by IFS variable.

Suppose IFS is ":" and hence expansion of "$*" will be like "$1:$2:$3:…"

And $@ within a pair of double quotes ("$@") is equivalent to the list of positional parameters separated by unquoted spaces, i.e., "$1" "$2".."$N". Or in other words, it is equivalent to the list of positional parameters where each parameters are double quoted.

For sake better understanding I wrote a script named star_and_at.sh and pushed to my public github.

You can clone my bash github public repository directly using following command

git clone git@github.com:sukujgrg/bash.git

Or, you can directly copy paste the script script from following URL
https://github.com/sukujgrg/bash/blob/master/star_and_at.sh

Read More

Subnetting

In this article I am explaining IPv4 subnetting in following two situations.

1. Subnetting when given a required number of networks [Example 1]
2. Subnetting when given a required number of hosts [Example 2]

Note : You need to have basics of IPv4 address types and basics of subnetting

Example 1 :

You have a Class C network range 210.20.1.0 and need to break it into 20 separate networks.

Note : Here you have the information about number of nerworks you needed.

Solution : 

(more…)

Read More

“test” operators in Bash

Bash Conditional Expressions

Here I am trying to list almost all available bash “tests” (file tests, string tests, arithmetic tests)

file operators description
-e <FILENAME> True, if exists
-f <FILENAME> True, if exists and is a regular file
-d <FILENAME> True, if exists and is a directory
-c <FILENAME> True, if exists and is a character special file
-b <FILENAME> True, if <FILENAME> exists and is a block special file
-p <FILENAME> True, if <FILENAME> exists and is a named pipe (FIFO)
-S <FILENAME> True, if <FILENAME> exists and is a socket file
-L <FILENAME> True, if <FILENAME> exists and is a symbolic link
-h <FILENAME> True, if <FILENAME> exists and is a symbolic link
-g <FILENAME> True, if <FILENAME> exists and has sgid bit set
-u <FILENAME> True, if <FILENAME> exists and has suid bit set
-r <FILENAME> True, if <FILENAME> exists and is readable
-w <FILENAME> True, if <FILENAME> exists and is writable
-x <FILENAME> True, if <FILENAME> exists and is executable
-s <FILENAME> True, if <FILENAME> exists and has size bigger than 0
-t <fd> True, if file descriptor <fd> is open and refers to a terminal
<FILENAME1> -nt <FILENAME2> True, if <FILENAME1> is newer than <FILENAME2> (mtime)
<FILENAME1> -ot <FILENAME2> True, if <FILENAME1> is older than <FILENAME2> (mtime)
<FILENAME1> -ef <FILENAME2> True, if <FILENAME1> is a hardlink to <FILENAME2>

 

string operators description
-z <STRING> True, if <STRING> is empty
-n <STRING> True, if <STRING> is not empty (this is the default operation)
<STRING1> = <STRING2> True, if the strings are equal
<STRING1> != <STRING2> True, if the strings are not equal
<STRING1> < <STRING2> True, if <STRING1> sorts before <STRING2>
<STRING1> > <STRING2> True, if <STRING1> sorts after <STRING2>

 

arithmetic operators description
<INTEGER1> -eq <INTEGER2> True, if the integers are equal
<INTEGER1> -ne <INTEGER2> True, if the integers are NOT equal
<INTEGER1> -le <INTEGER2> True, if the first integer is less than or equal second one
<INTEGER1> -ge <INTEGER2> True, if the first integer is greater than or equal second one
<INTEGER1> -lt <INTEGER2> True, if the first integer is less than second one
<INTEGER1> -gt <INTEGER2> True, if the first integer is greater than second one

Read More

Differences between innodb and myisam mysql storage engines

Data in MySQL is stored in files (or memory) using a variety of different techniques. Each of these techniques employ different storage mechanisms, indexing facilities, locking levels and ultimately provide a range of different functions and capabilities. By choosing a different technique you can gain additional speed or functionality benefits that will improve the overall functionality of your application.

Comparison between the MyISAM and InnoDB storage engines of MySQL.

innodb myisam

1

row-level locking table-level locking

2

supports transaction does not support transactions

3

foreign key constraints no foreign key constraints

4

row count is not stored internally and so slow COUNT(*)s row count is stored internally and so fast COUNT(*)s

5

automatic crash recovery no automatic crash recovery, but it does offer repair table functionality

6

stores both data and indexes in one file stores indexes in one file and data in another

7

uses a buffer pool (innodb_buffer_pool_size) to cache both data and indexes uses key buffers (key_buffer) for caching indexes and leaves the data caching management to the operating system

8

ACID(Atomicity, Consistency, Isolation and Durability) compliant not ACID compliant

Read More

Boot Hiren Boot CD via Network

In this article I am explaining the steps for booting a Hiren’s Boot CD over network.

Note : For booting over network need Network Boot enabled mother board.

Environment:

1. A pfsense gateway with DHCP enabled.

2. Network subnet is 192.168.3.0/24.

3. Ubuntu machine as TFTP server.

Basically for PXE booting we need DHCP server and a TFTP server. Here we already have a DHCP server (pfsense DHCP server).

Step 1: DHCP and PXE Server

(more…)

Read More