jamin on October 21st, 2002

A few years ago, my friend Adam wrote an awk script to parse UNIX mbox mail files and report new mail. The script has been very helpful to me and I’ve been using it for ages, particularly at work where I have it running automatically every 90 seconds to show me in which folders I have new mail. Thanks, Adam!

The only problem has been that it seems slow. Knowing that in many cases Perl has been a lot faster than alternatives like tcl and awk, I decided to write a Perl version. Last night I wrote a first draft. I can optimize even further, but even without optimization it seems to run in a fraction of the time the awk script ran in. Perl++. Click below to see the script.


#!/usr/bin/perl -l
open(RCFILE, "$ENV{HOME}/.newmailrc") or die "cannot open .newmailrc: $!";
while (<RCFILE>) {
	my ($name, $file) = split;
	my ($mail, @emails);
	push @order, $name;
	{
		local $/;
		open(MAIL, $file);
		$mail = <MAIL>;
		@emails = split /^From /m, $mail;
	}
	shift @emails;
	for (@emails) {
		$mailcount{$name}++ unless /^status:s*[rod]/im;
	}
	close MAIL;
}
close RCFILE;
for (@order) {
	if (my $count = $mailcount{$_}) {
		print "$_: $count";
	}
}

Tags:

5 Responses to “newmail”

  1. Forgot to include the format of the .newmailrc. Here’s my newmailrc:

    in /var/spool/mail/jamin
    junk /home/jamin/imap/junk.in
    47society /home/jamin/imap/Lists/47society.in
    announce /home/jamin/imap/Lists/announce.in
    cvs /home/jamin/imap/Lists/cvs.in
    desktop-devel /home/jamin/imap/Lists/desktop-devel.in
    foundation /home/jamin/imap/Lists/foundation.in
    gabber /home/jamin/imap/Lists/gabber.in
    galeon /home/jamin/imap/Lists/galeon.in
    gnome-1.4 /home/jamin/imap/Lists/gnome-1.4.in
    gnome-2.0 /home/jamin/imap/Lists/gnome-2.0.in
    gnome2-release /home/jamin/imap/Lists/gnome2-release.in
    gnomecc /home/jamin/imap/Lists/gnomecc.in
    gnome-devel /home/jamin/imap/Lists/gnome-devel.in
    gnome-hackers /home/jamin/imap/Lists/gnome-hackers.in
    gnome /home/jamin/imap/Lists/gnome.in
    gaim /home/jamin/imap/Lists/gaim.in
    gnome-love /home/jamin/imap/Lists/gnome-love.in
    mono /home/jamin/imap/Lists/mono.in
    nautilus /home/jamin/imap/Lists/nautilus.in
    SLUUG /home/jamin/imap/Lists/SLUUG.in
    usability /home/jamin/imap/Lists/usability.in
    ximian /home/jamin/imap/Lists/ximian.in
    perl-quiz /home/jamin/imap/Lists/perl-quiz.in

  2. As Adam pointed out, I can do the printing from within the main loop. Originally I thought the script would be slow enough to cause printing to be choppy, but doesn’t look like that’s an issue.

    #!/usr/bin/perl -l
    
    open(RCFILE, "$ENV{HOME}/.newmailrc") or die "cannot open .newmailrc: $!";
    while (<RCFILE>) {
    	my ($name, $file) = split;
    	my ($mail, @emails);
    	push @order, $name;
    	{
    		local $/;
    		open(MAIL, $file);
    		$mail = <MAIL>;
    		@emails = split /^From /m, $mail;
    	}
    	shift @emails;
    	for (@emails) {
    		$mailcount{$name}++ unless /^status:\s*[rod]/im;
    	}
    	close MAIL;
    	print "$name: $mailcount{$name}" if $mailcount{$name};
    }
    close RCFILE;
    
  3. Adam also pointed out that I can cleverly use a newline to avoid the split…

    #!/usr/bin/perl -l
    
    open(RCFILE, "$ENV{HOME}/.newmailrc") or die "cannot open .newmailrc: $!";
    while (<RCFILE>) {
    	my ($name, $file) = split;
    	my ($mail, @emails);
    	push @order, $name;
    	{
    		local $/ = "\nFrom ";
    		open(MAIL, $file);
    		@emails = <MAIL>;
    	}
    	for (@emails) {
    		$mailcount{$name}++ unless /^status:\s*[rod]/im;
    	}
    	close MAIL;
    	print "$name: $mailcount{$name}" if $mailcount{$name};
    }
    close RCFILE;
    
  4. And the final version after some optimizations:

    #!/usr/bin/perl -l
    open(RCFILE, "$ENV{HOME}/.newmailrc") or die "cannot open .newmailrc: $!";
    while (<RCFILE>) {
    	my ($name, $file) = split;
    	my $count;
    	{   local $/ = "\nFrom ";
    		open(MAIL, $file) or die "cannot open $file: $!";
    		while (<MAIL>) {
    			$count++ unless /^status:\s*[rod]/im;
    		}
    	}
    	print "$name: $count" if $count;
    }
    
  5. i’ve been looking for exactly something like this for a bit. seems to work great, until it hits a mailbox with messages with rather large attachments, then it grinds to a hault. For instance, on one mailbox totaling 7MB in size, with a message containing a 5MB attachment, it takes 20 seconds to run.