Network traffic was growing, and it was getting hard to really "see" what was sucking up the capacity. It could be spam, or YouTube, viruses, hack attempts, or anything. Often, the quick fix is to buy more capacity, but that's messy. It was better to analyze the traffic.
The problem is, net analysis tools are expensive. We had a Sonicware SOHO gateway. It's pretty good, though expensive to maintain, but it stops enough bad traffic that it's worth something. (Yes, I know how to make a Linux router.) It turned out that the device could send syslog events to another computer.
Sonicwall and a couple other vendors even sell analysis tools. That's cool, but, it's also instructive to build it yourself. It's also cheaper.
So, first things first, you have to capture these syslog events. The tool I used wasn't syslog, but a more flexible system called rsyslog. It's in the ubuntu repositories.
Set it up to accept syslog messages. This is set up in /etc/rsyslog.conf (it's not the "-r" switch that syslogd uses).
The log lines look like this and there are many of them:
Jul 7 16:19:55 id=firewall sn=0006B123EA98 time="2009-07-07 16:19:55" fw=75.38.0.113 pri=6 c=262144 m=98 msg="Connection Opened" n=54135 src=192.168.1.135:4694:LAN dst=208.82.236.208:80:WAN proto=tcp/http Jul 7 16:19:55 id=firewall sn=0006B123EA98 time="2009-07-07 16:19:55" fw=75.38.0.113 pri=6 c=262144 m=98 msg="Connection Opened" n=54135 src=192.168.1.135:4695:LAN dst=208.82.236.208:80:WAN proto=tcp/http Jul 7 16:19:55 id=firewall sn=0006B123EA98 time="2009-07-07 16:19:55" fw=75.38.0.113 pri=6 c=262144 m=98 msg="Connection Opened" n=54135 src=192.168.1.135:4695:LAN dst=208.82.236.208:80:WAN proto=tcp/http
You don't want this mingling with the other log lines, so, you filter by the "hostname". I got real confused about this, but, what they mean is the field after the date. In our case, it's "id=firewall". Yeah, weird.
rsyslogd supports a feature called "blocks" in the config. If you add a line that starts with "+" it filters in hosts:
+id=firewall
If you add a line that starts with "-" it filters out hosts:
-id=firewall
Arrange these accordingly so you dump all the firewall logs into a file.
rsyslogd has a way to send the lines to a named pipe, so you could run a daemon that will insert the log lines into a database. The problem with this, is, what happens if there's a surge of activity and inserts take long. You might bog the computer down.
I opted for the "run a cron to load the database" method.
It turns out that the firewall spits out a lot of data. Several megabytes per hour during business hours. My fix is to save the data into a database, and delete the log.
This is better than compression, because we retain the data in a usable format. Not only that, the data is "live" so you can query it. Here's a schema that will hold the data:
CREATE TABLE IF NOT EXISTS `logs` (
`datetime` int(11) NOT NULL,
`pri` smallint(2) NOT NULL,
`m` smallint(3) NOT NULL,
`src` int(11) NOT NULL,
`srcport` mediumint(5) NOT NULL,
`srcint` enum('','WAN','LAN','OPT') collate ascii_bin NOT NULL,
`dst` int(11) NOT NULL,
`dstport` mediumint(5) NOT NULL,
`dstint` enum('','WAN','LAN','OPT') collate ascii_bin NOT NULL,
`proto` enum('tcp','udp','icmp') collate ascii_bin NOT NULL,
`sent` int(11) NOT NULL,
`recd` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=ascii COLLATE=ascii_bin;
The trick's to store numeric data as numbers, not as strings. Enums take up a single byte. Dates are converted to unix timestamps. Each line takes 34 bytes, which is around 1/6 the size of a regular log line. There's some overhead in the database, and it seems like moving log lines to the database ends up consuming 1/3 of the space. Not bad.
There are a few situations where you lose data, but, we can fix that later.
If you add indexes, it'll increase the size of the db. Adding a few obvious ones doubled the size, so, away they went. Even with around 900k, on a P3 desktop, queries with aggregation, sorting and grouping were fast. This would cover around 1/3 data of data. So indexes aren't necessary for queries for short-term data.
It's probably fast because, while we have to do the linear scan, we don't have to dereference the index, and we are comparing integers. Once the data is cached, it's like looping over lists of numbers.
Below is a script that will read the log file, and move it into the database. The file's deleted when it's done. You can run this, as root, from cron.
#! /usr/bin/perl
# vim:ts=4:sw=4:ai:
use Date::Parse;
use DBI;
use Socket;
$dsn = "DBI:mysql:database=firewall;host=localhost";
$dbh = DBI->connect( $dsn, 'firewall', '' );
$path = '/home/johnk/Sonicwall';
rename "$path/log", "$path/log.tmp";
system('/etc/init.d/rsyslog restart');
open LOG,"<$path/log.tmp";
while (my $line = <LOG>)
{
chomp $line;
my @parts = split /\s+(\w+)=/, $line;
shift @parts;
%hash = @parts;
my ($ip, $srcport, $srcint) = split /:/,$hash{src};
$hash{src} = unpack('l',inet_aton($ip));
my ($ip, $dstport, $dstint) = split /:/,$hash{dst};
$hash{dst} = unpack('l', inet_aton($ip));
$hash{'time'} = str2time(substr($hash{'time'},1,-1));
$proto = $hash{proto};
$proto = 'tcp' if ($proto =~ /tcp/);
$proto = 'udp' if ($proto =~ /udp/);
$proto = 'icmp' if ($proto =~ /icmp/);
$sent = $hash{sent} + 0;
$recd = $hash{recd} + 0;
$sql = "INSERT INTO logs VALUES ($hash{time},$hash{pri},$hash{m},$hash{src},$srcport,'$srcint',$hash{dst},$dstport,'$dstint','$proto',$sent,$recd)";
$sth = $dbh->prepare($sql);
if (! $sth->execute() )
{
print $line;
print "\n";
print $sql;
print "\n";
exit;
}
}
close LOG;
unlink "$path/log.tmp";
Okay, that's not quite enough to do it all, but it's getting there. It's continued in Creating a Firewall Log Analysis Tool for a Sonicwall DSL Router, Part 2.