For When You Can't Have The Real Thing
[ start | index | login ]
start > Spam Statistics

Spam Statistics

Created by dave. Last edited by dave, 14 years and 119 days ago. Viewed 2,113 times. #10
[diff] [history] [edit] [rdf]

Spam Statistics

DateTotal MessagesProbable SpamDefinite SpamEscaped SpamVirusesDelivered Normally
18 Sept 200423393145671991-6646159
27 Nov 200427113165312208-7277635
13 Dec 200434513196952549-7299707


  • in early December 2004 I joined the high-volume Fedora Users mail-list (averages about 400 messages a day, most of them crap). Thus the message counts start to skyrocket, and the proportion and number of non-spam messages also skyrockets.
  • on 12 December 2004, the process that updates the bayesian learning blew out its database and now has to be retrained. So the 'Escaped Spam' field will only be filled in from that point forward. Perhaps I should reset the statistics so we can watch the learner in action.
OK, so I did it.
  • I reset the logs, removed my corpus, and deleted the one-day-old bayesian dictionaries.
  • I'll keep the spam and ham for a while, and we can see how things progress.
  • For now let's also say I'll train the bayesian filter and update the virus scanner weekly.
  • I'll also set up a cronjob to capture statistics every monday or so and we can see how fast the bayesian filter learns.
DateTotal MessagesProbable SpamDefinite SpamEscaped SpamVirusesDelivered Normally
no comments | post comment
This is a collection of techical information, much of it learned the hard way. Consider it a lab book or a /info directory. I doubt much of it will be of use to anyone else.

Useful: | Copyright 2000-2002 Matthias L. Jugel and Stephan J. Schmidt