Know your enemies - spamstat shows all
Every day my mail server sends me an email containing a list of all spam hosts trying to send me messages I don't want. This way I can see which systems failed in sending me their crap and how often they failed. Among other stuff in this informative email, the spammer list goes as follows:
Checking for rejected mail hosts:
15 [89.43.145.161]
15 189.27.73.118.adsl.gvt.net.br
13 hst85-28-237-241.real.kamchatka.ru
12 [201.230.150.61]
9 ppp91-122-159-139.pppoe.avangard-dsl.ru
9 ppp-124.120.106.84.revip2.asianet.co.th
9 gprs25.vodafone.hu
9 dynamic-87-105-191-65.ssp.dialog.net.pl
9 cdma-149-78-94.msk.skylink.ru
9 acwx148.neoplus.adsl.tpnet.pl
What you can see here is the number of rejects and the spamming systems' names or IP addresses if there's no DNS entry for them. If you consider that I'm getting such an email on any given day, then there's plenty of data to play with. Basically, we want a database to save all this email data into and a mechanism who reads the emails and extracts the list of rejected mail hosts. I'm assuming that you're running a system with Perl 5.8 installed and a simple database system (here we'll be using MySQL). Further Perl modules available at CPAN will also be needed because we won't reinvent the wheel. I'm further assuming that we're using the IMAP protocol when accessing our own mail server; if you must use POP3 you'll have to change the script below, accordingly.
1. The database.
This is not an enterprise software, so let's keep it simple. We need some fields for the spamming host information, the number of rejects and the date of the event. Here's the schema:
CREATE TABLE `spamstat`.`HL_HostList` (
`Index` int(10) unsigned NOT NULL auto_increment,
`HL_Date` date NOT NULL default '0000-00-00',
`HL_Hostname` varchar(255) default 'N/A',
`HL_HostIP` varchar(15) default '0.0.0.0',
`HL_NumRejects` int(10) unsigned default '0',
PRIMARY KEY (`Index`),
KEY `Hostname` (`HL_Hostname`),
KEY `HostIP` (`HL_HostIP`),
KEY `Rejects` (`HL_NumRejects`,`HL_HostIP`),
KEY `Date` (`HL_Date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COMMENT='List of hosts trying to spam me'
The 'Index' is just a counter. 'HL_Date' is the date field that fetches the email's date. 'HL_Hostname' and 'HL_HostIP' save the spam mail host name and its IP address (though we will implement getting the IP address later). Finally, 'HL_NumRejects' saves the number of rejects a spam hosts experiences before it gives up.
Create this table in a new database called spamstat.
2. The script.
Our script has a lot of things to do: call the mail server using your mail account, read the right emails, extract the relevant information and save it into the database. By using the right Perl modules this is astonishingly easy: we'll need Net::IMAP::Simple, Email::Simple, DBD::mysql, and Date::Manip. If you're using a database other than MySQL, you'll have to use a different DBD module. But don't worry: we're just using one INSERT SQL statement here.
After including the four modules from above, we're opening the database connection:
my $Dbh;
$Dbh = DBI->connect("DBI:mysql:database=spamstat;host=localhost", "dbuser", "", {
AutoCommit => 1,
PrintError => 1,
RaiseError => 1
});
It's a good practice to do that before the relevant action starts because this is a quite expensive operation and we don't want this happen in a for loop. The disconnect() will happen at the very end.
After this, we're ready to read our emails:
# open a connection to the IMAP server
my $server = Net::IMAP::Simple->new('my.mailserver.tld') ||
die "Unable to connect to IMAP: $Net::IMAP::Simple::errstr\n";
# login
if (!$server->login('username', 'password')) {
print STDERR "Login failed: " . $server->errstr . "\n";
exit(64);
}
# select the desired folder
my $number_of_messages = $server->select('serverstats');
Opening the connection to the IMAP server is straightforward. Replace 'my.mailserver.tld' by the name of your own server, then 'username' and 'password' according to your needs.
If you're sorting your emails directly on the server (like me), you'll have to select a different folder than 'INBOX'. My server status mail are bundled together in a folder called 'serverstats', so we have to address this one.
Now for the interesting part:
# catch all emails
for (my $i = 1; $i <= $number_of_messages; $i ){
my $es = Email::Simple->new(join '', @{ $server->get($i) } );
if ($es->header('Subject') eq "mymailserver.tld daily run output") {
print "Mail from " . $es->header('Date') . "\n";
my $date = ParseDate($es->header('Date'));
if (!$date) {
print "Bad date in ".$es->header('Date')."\n";
} else {
my ($year, $month, $day) = UnixDate($date, "%Y", "%m", "%d");
print "E-Mail from $day/$month/$year\n";
print $es->header('Subject')."\n";
my @Entries = ($es->body =~ /\s (\d )\s(\S )\r\n/gi);
for (my $j = 0; $j < scalar(@Entries); $j =2) {
&WriteRecord($day, $month, $year, $Entries[($j 1)], $Entries[$j]);
}
}
}
}
As you can see easily, this is a for loop where we fetch every single email in the selected folder, one by one. The $number_of_messages was given back by addressing he 'serverstats' folder. $i is the index of the ith email we're reading into the $es structure. $server->get returns a list reference of the fetched mail. We're translating this into an array and join its lines. The result is a big string saved in $es.
Email::Simple's header and body methods then allow easy access to all relevant parts of the read email. First we're checking if it's the right mail by reading its subject line. If it's correct, we'll proceed with reading the date information from the appropriate header line and then converting it into a real date value by using Date::Manip's function UnixDate. After this, we have to extract the body of the message by using a small regular expression that looks for decimals after a blank, followed by one blank, a string, then a carriage return and a linefeed. This results in the @Entries array that carries all information from the list above: the number of rejects followed by the spam server's name, followed by the next number of rejects and the next spam server's name and so on. For this reason we have to raise the index of the inner for loop (where we will use @Entries entries) by 2.
What's missing is the access to the database. It's happening in WriteRecord:
sub WriteRecord() {
my ($day, $month, $year, $Hostname, $NumRejects) = @_;
print $day.".".$month.".".$year." ".$Hostname.".".$NumRejects ."\n";
my $SQLStatement = "INSERT INTO HL_HostList VALUES (null,
'".$year."-".$month."-".$day."', ".$Dbh->quote($Hostname).", '',
".$NumRejects.")";
$Dbh->do($SQLStatement);
}
As you can see, we're just using an INSERT statement to save all data. The rest of our script is just tidying up and closing all connections after the last mail is being read. One thing - besides the empty HL_HostIP field - is missing: after reading all emails you might start your script once more and immediately clutter your database. So better erase the emails (or store them elsewhere) before re-using the script again.
If you now browse through your data you'll soon discover hosts and providers that continually try to spam you. If you have any ideas on what to do with these data, just let me know!
The complete script:
use Net::IMAP::Simple;
use Email::Simple;
use DBD::mysql;
use Date::Manip;
use strict;
my $Dbh;
$Dbh = DBI->connect("DBI:mysql:database=spamstat;host=localhost", "root", "", {
AutoCommit => 1,
PrintError => 1,
RaiseError => 1
});
# open a connection to the IMAP server
my $server = Net::IMAP::Simple->new('mymailserver.tld') ||
die "Unable to connect to IMAP: $Net::IMAP::Simple::errstr\n";
# login
if (!$server->login('user', 'password')) {
print STDERR "Login failed: " . $server->errstr . "\n";
exit(64);
}
# select the desired folder
my $number_of_messages = $server->select('serverstats');
# catch all emails
for (my $i = 1; $i <= $number_of_messages; $i ){
my $es = Email::Simple->new(join '', @{ $server->get($i) } );
if ($es->header('Subject') eq "mabuse.de daily run output") {
print "Mail from " . $es->header('Date') . "\n";
my $date = ParseDate($es->header('Date'));
if (!$date) {
print "Bad date in ".$es->header('Date')."\n";
} else {
my ($year, $month, $day) = UnixDate($date, "%Y", "%m", "%d");
print "E-Mail from $day/$month/$year\n";
print $es->header('Subject')."\n";
my @Entries = ($es->body =~ /\s (\d )\s(\S )\r\n/gi);
for (my $j = 0; $j < scalar(@Entries); $j =2) {
&WriteRecord($day, $month, $year, $Entries[($j 1)], $Entries[$j]);
}
}
}
}
sub WriteRecord() {
my ($day, $month, $year, $Hostname, $NumRejects) = @_;
print $day.".".$month.".".$year." ".$Hostname.".".$NumRejects ."\n";
my $SQLStatement = "INSERT INTO HL_HostList VALUES (null,
'".$year."-".$month."-".$day."', ".$Dbh->quote($Hostname).", '',
".$NumRejects.")";
$Dbh->do($SQLStatement);
}
$Dbh->disconnect();;
$server->quit;