What does it do?
This perl script will summarize your web logs from Apache, Lighttpd, tHttpd or any other web server which reports its logs in standard apache format. The output is designed to be mailed out to web admins and those concerned about web operations and is friendly to the small screens of devices like the Blackberry.The report below will report:
- Top 10 requesters
- Top 10 by volume downloaded
- Top 10 URLs requested
- Top 10 URLs per host
- Number of requests per status class
What does the report look like?
Lets take a look at the output of the report so you can see what it is about before you get into the simple install.From: root@YOUR_HOST.com (your name) To: root@YOUR_HOST.com Date: Mon, 10 Jan 2010 10:20:30 -0600 (EDT) Subject: your_hostname.domain.net web report Analysis of log records between: Mon Jan 01 10:20:30 2010 and Wed Jan 07 20:30:40 2010 Top 10 requesters: 19,482 requests (free-stuuf.grepper.com) 7,902 requests (someguy.fredom.com) 134.235.27 704 requests (noone.tester.net) 546 requests (who.what.org) 508 requests (telus.nt.net) 288 requests (dhcp.freeisp.com) 248 requests (crawl-66-249-67-20.googlebot.com) 146 requests (tgresde.domain.info) 92 requests (ATuileries-153-1-99-194.w90-24.abo.wanadoo.fr) 86 requests (mx.superinfo.com) Top 10 by volume downloaded: 102,607,182 bytes (free-stuuf.grepper.com) 61,172,188 bytes (someguy.fredom.com) 11,699,030 bytes (who.what.org) 4,105,354 bytes (greg.desrtr.net) 793,926 bytes (crawl-66-249-67-20.googlebot.com) 760,164 bytes (mx.superinfo.com) 536,782 bytes (def92-12-88-177-248-14.fbx.prexad.net) 528,950 bytes (lovejay.w3.org) 523,726 bytes (ATuileries-123-1-99-194.w90-24.abo.wanadoo.fr) 507,112 bytes (per87-1-88-167-12-47.fbx.prexad.net) Top 10 URLs requested: 9,086 /favicon.ico 4,902 /your.css 3,656 /some_pic.jpg 3,618 /another_pic.jpg 2,996 /happy.jpg 2,794 / 1,040 /big_file.html 694 /frdfg.html 338 /wonder.html 332 /grted.html Top 10 URLs per host: 2214 /favicon.ico (someguy.fredom.com) 1186 /your.css (someguy.fredom.com) 1124 /some_pic.jpg (who.what.org) 1116 /another_pic.jpg (who.what.org) 1094 /happy.jpg (who.what.org) 994 /happy.jpg (lovejay.w3.org) 958 /wonder.html (mx.superinfo.com) 932 / (mx.superinfo.com) 910 /wonder2.html (per87-1-88-167-12-47.fbx.prexad.net) 724 / (def92-12-88-177-248-14.fbx.prexad.net) Number of requests per status class: 200 528,678 300 01,408 400 140 500 10
If the output above looks like something you can use then lets get started on setting it up for your environment. Three steps and about five(5) minutes of your time.
Starting the Install
Step 1: is getting the script and looking at the options. Below you can download the calomel_web_report.pl as a file and you can also browse the same script in a scrollable text window. Both are provided so you can easily review the Perl script.You can download calomel_web_report.pl here by doing a "save as" or just clicking on the link and choosing download. Before using the config file take a look it below or download it and look at the options. Calomel.org web_report.pl
#!/usr/bin/perl # ####################################################### ### Calomel.org web_report.pl BEGIN ####################################################### use Time::Local; my $logdir = '/var/log/web_server'; opendir D,$logdir or die "Could not open $logdir ($!)"; @logfiles = sort grep /^access.log/, readdir D; closedir D; # Just use the 6 most recently archived log files. shift @logfiles while @logfiles > 6; my (%host, %url, %status, %urlsperhost); my ($mintime,$maxtime) = (10_000_000_000, 0); my %mon = qw/Jan 0 Feb 1 Mar 2 Apr 3 May 4 Jun 5 Jul 6 Aug 7 Sep 8 Oct 9 Nov 10 Dec 11/; foreach my $f (@logfiles,'access.log'){ $logdir = '/var/log/lighttpd' if $f eq 'access.log'; open F,"$logdir/$f" or die "Could not open $logdir/$f ($!)"; while(){ my ($host, $ident_user, $auth_user, $day,$mon,$year, $hour,$min,$sec, $time_zone, $method, $url, $protocol, $status, $bytes, $referer, $agent) = / # regexp begins ^ # beginning-of-string anchor (\S+) # assigned to $host \ # literal space (\S+) # assigned to $ident_user \ # literal space (\S+) # assigned to $auth_user \ # literal space \[ # literal left bracket (\d\d) # assigned to $day \/ # literal solidus ([A-Z][a-z]{2}) # assigned to $mon \/ # literal solidus (\d{4}) # assigned to $year : # literal colon (\d\d) # assigned to $hour : # literal colon (\d\d) # assigned to $min : # literal colon (\d\d) # assigned to $sec \ # literal space ([^\]]+) # assigned to $time_zone \]\ " # literal string '] "' (\S+) # assigned to $method \ # literal space (.+?) # assigned to $url \ # literal space (\S+) # assigned to $protocol "\ # literal string '" ' (\S+) # assigned to $status \ # literal space (\S+) # assigned to $bytes \ # literal space "([^"]+)" # assigned to $referer \ # literal space "([^"]+)" # assigned to $agent $ # end-of-string anchor /x # regexp ends, with x modifier or next; $host eq '::1' and next; # Ignore Apache generated requests from localhost. $bytes =~ /^\d+$/ or $bytes = 0; $host{$host}++; $bytesperhost{$host} += $bytes; $url{$url}++; $status_class = int($status/100) . '00'; $status{$status_class}++; $urlsperhost{"$host $url"}++; # Parse the $time_zone variable. my $tz = 0; my ($tzs,$tzh,$tzm) = $time_zone =~ /([\-+ ])(\d\d)(\d\d)/; if(defined $tzs){ $tzs = $tzs eq '-' ? 1 : -1; $tz = $tzs * (3600*$tzh + 60*$tzm); } my $time = timegm($sec,$min,$hour,$day,$mon{$mon},$year-1900) + $tz; $mintime = $time if $time < $mintime; $maxtime = $time if $time > $maxtime; } close F; } my $start = localtime $mintime; my $end = localtime $maxtime; print "Analysis of log records between:\n$start and\n$end\n\n"; my %dns; my @toprequestors = (sort { $host{$b} <=> $host{$a} } keys %host)[0..9]; print "Top 10 requesters:\n"; foreach my $host (@toprequestors){ my $name = dns($host); printf " %-15s %12s requests$name\n",$host,add_commas($host{$host}); } print "\n"; my @topvolume = (sort { $bytesperhost{$b} <=> $bytesperhost{$a} } keys %bytesperhost)[0..9]; print "Top 10 by volume downloaded:\n"; foreach my $host (@topvolume){ my $name = dns($host); printf " %-15s %16s bytes$name\n",$host,add_commas($bytesperhost{$host}); } print "\n"; my @topurls = (sort { $url{$b} <=> $url{$a} } keys %url)[0..9]; print "Top 10 URLs requested:\n"; foreach my $url (@topurls){ printf " %12s $url\n",add_commas($url{$url}); } print "\n"; my @topurlsperhost = (sort { $urlsperhost{$b} <=> $urlsperhost{$a} } keys %urlsperhost)[0..9]; print "Top 10 URLs per host:\n"; foreach my $hosturl (@topurlsperhost){ my ($host,$url) = split " ",$hosturl; my $name = dns($host); printf " %4d %-15s $url$name\n",$urlsperhost{$hosturl},$host; } print "\n"; print "Number of requests per status class:\n"; foreach my $class (sort {$a <=> $b} keys %status){ printf "%4d %16s\n",$class,add_commas($status{$class}); } sub dns{ my $ip = shift; return $dns{$ip} if defined $dns{$ip} && $dns{$ip}; my $lookup = `/usr/sbin/host $ip 2>/dev/null`; my $name; if($lookup =~ /NXDOMAIN/ or $lookup =~ /SERVFAIL/ or $lookup =~ /timed out/ ){ $name = ''; } else{ $name = (split ' ',$lookup)[-1]; $name =~ s/\.$//; $name = " ($name)"; } $dns{$ip} = $name if $name; $name; } sub add_commas{ # Add commas to a number string (e.g. 1357924683 => 1,357,924,683) my $num = reverse shift; $num =~ s/(...)/$1,/g; chop $num if $num =~ /,$/; $num = reverse $num; } ####################################################### ### Calomel.org calomel_web_report.pl END #######################################################
Step 2: The only option in the script is telling it where to find your log directory. In our example the logs at in the /var/log/web_server directory. This directory contains all of the access_log files we are looking for. This is the ninth(9th) line at the top of the script your are looking for:
my $logdir = '/var/log/web_server';
Step 3: Now that you have the script and you edited the $logdir directive to tell the script where to look for the logs it is time to setup a cron job to run it. You may find that a cron job run once in the morning and once before the end of the working day will be most beneficial. This an example cron job running the calomel_web_report.pl script in the /tools at 8am and 5pm every day to root.
#minute (0-59) #| hour (0-23) #| | day of the month (1-31) #| | | month of the year (1-12 or Jan-Dec) #| | | | day of the week (0-6 with 0=Sun or Sun-Sat) #| | | | | commands #| | | | | | #### Calomel.org Web Report (cron job) 00 8,17 * * * /tools/calomel_web_report.pl | mail -s "`hostname` web report" root
