+----------+
| Deleatur | Version 3.2.3 (November 2004)
+----------+
Deleatur© is a program for recognition of spam mail merely based on
the counted frequency of words (Bayes filter). The name comes from
Latin and means "it is to be deleted". It is pronounced like the
English construct "daylayartour" - with the stress on "ar".
Deleatur is (according to user opinions) the first spam filter
program which really eases work for a user: While other spam
filters load down spam mail completely and move spam to a special
folder, Deleatur deletes spam directly on the POP server after
having received only a part of it.
Deleatur should be used for looking for mail. Only if regular mail
is left you start your own mail program. Deleatur runs in a
command window. Ideally - even when 50 mails are to be processed -
you only need to press the enter key a few times. In
automatic mode Deleatur looks for mail in determined
intervals and deletes spam mail automatically.
Deleatur learns from Mail read in by remembering for all words
found how often they occur in regular mail and how often in spam
mail. Deleatur can also be trained using mail files if available.
Also the word base available with the program leads to good results
if you receive English spam mail on the one hand and German as well
as English regular Mail on the other hand.
Deleatur offers the opportunity to carry on the dialog in a
language of your choice provided that you have modified a text file
accordingly. At this time German, English, Italian (thanks to
Roberto Sozzani!) and Portuguese (thanks to Carlos Caramori!) are
supplied.
Innovations /*********************************************************/
Deleatur 3.2.2: The parameter specifications c=iso, f=0, h=1, i=1,
n=1, q=1, and w=0 have been added. Statistics are
presented at the end regarding read and deleted
mail. One error has been eliminated which inhibited
deletion of mail on a certain POP server. Now
Quicksort is used during reduction of the word
base. Following newer tactics of spammers (sending
many words) the default assumptions for automatic
rating have been increased. One error in the
reduction of the word base has been corrected.
Deleatur 3.2.3: It's possible now to run Deleatur permanently
(and deleting spam automatically). In the case of
many mails you can interrupt your rating and
continue at a later date. The algorithm has been
enhanced to enable faster changing of ratings.
Minus and plus files have been introduced
similarly to bonus and malus. You can now add
a portname after the server name. Besides German
and English, Italian and Portuguese dialogs have
been added. The parameter specification g=1 helps
in case of a faulty POP server. The parameter y
allows inclusion of further header lines in the
rating process. (Please, read again parameters e,
g, s, y, and z.)
The Windows version has one known error: After about 55 Mails some
empty areas of the window are painted in wrong color for obscure
reasons.
The Algorithm /*******************************************************/
There are two articles of mine in the journal of the
"Zentrum für Informationsverarbeitung" (Center for Information
Processing, i. e. computing center) of the University of Münster:
http://www.uni-muenster.de/ZIV/inforum/2003-1/a08.html
http://www.uni-muenster.de/ZIV/inforum/2003-2/a02.html
The following articles inspired me to program Deleatur:
- http://www.paulgraham.com/spam.html
- http://radio.weblogs.com/1010454/stories/2002/09/16/spamDetection.html
The program remembers for every word of the first 50 lines of each
mail (plus a few header lines) how often the word has been found
in spam and how often in regular mail.
While the word base increases the program decides more and more
automatically, whether a mail should be considered spam or not and
doesn't ask any more. At a size of the word base of 3 MB all
mail will be accepted below a probability of 10 % without inquiry
and mail above 80 % will be discarded without inquiry.
It may be better not to use the word base file which has been
supplied in this package but to build it yourself whithin two weeks
or so: Spam mail is always similar but everybody gets different
regular mail.
You can find the words relevant for the decision in the protocol
file. There is a protocol file YYYYMMDD.log for every day where
YYYY denotes the year, MM the month, and DD the current day. On
every call to Deleatur more lines will be appended at the end of
the file. By default the protocol file of the day before today will
be kept, older ones will be deleted.
If you find your mail marked by the rule based spam filter
SpamAssassin you can influence the decisions of Deleatur in that
way that no mail is automatically deleted or accepted if
SpamAssassin's opinion is different from Deleatur's one. You
will, however, find that you only have more to do by that. :-)
I think indeed that somebody who sends me a mail containing words
which otherwise only exist in spam should not be surprised if
that mail is deleted by a spam filter. Nevertheless Deleatur
has the facility to specify in a bonus file and in a malus file
how to prevent automatic deleting and accepting of mail. This is
only if you couldn't sleep at night otherwise. Really important
messages are sent on paper! :-)
Installation and Handling /*******************************************/
There is no real installation procedure. Unzip the file
deleatur.zip (if you haven't done it already) in a folder of your
choice - ready!
After starting deleatur.exe the first time (for example by
double clicking) you have to decide which language you want to
use: 1 means German, 2 enables an Englisch dialog.
After that you will be asked which POP server you want to use and
what are user ID and password. The program stores this information
in the parameter file deleatur.prm, the password is stored
encrypted.
There are the following replies to inquiries of the program:
+: Mail should be accepted.
-: Mail should be considered Spam.
=: Leave the word base untouched, don't rate the mail.
?: Show the first 50 lines of the mail (after header lines).
Use the enter key directly to accept the program's suggestion.
In the end there will always come the question whether deletion
should really take place. Up to this time there has no mail
actually been deleted (this is also true when pressing Ctrl and
c simultaneously). In detail there are the following responses to
the concluding inquiry of the program:
Enter key:
This is the normal way to end Deleatur. Ratings are stored,
spam mail is deleted (if existent).
a: All ratings are saved to the word base file but no mail is
deleted. The program ends.
b: The word base should be unchanged, no mail is to be deleted.
The program ends.
mail number(s):
If you enter one or more numbers in arbitrary order (delimited
by one or more blanks) these mails will be presented again
for decision.
You can specify in the files deleatur.bon and deleatur.mal,
respectively, which text in which header lines of a mail leads to a
bonus or malus given to the mail in question. A bonus will prevent
a mail from beeing deleted automatically, and a malus will prevent a
mail from beeing accepted automatically. If it so happens the
program will ask for a decision.
If you don't want to throw away mail coming from a certain
professor you can specify in the bonus file:
from:higgins
So if "higgins" comes anywhere in the "from" header line
the mail gets a bonus. The same is possible for all other header
lines, for example subject or recipient (to:):
subject:***spam***
to:my_newsgroup
Please, avoid blanks before and after then colon. If a word of the
bonus or the malus file resp. is found the text or
resp. appears in the rating line.
In a similar war you can specify in a file deleatur.min, which
mail has to be rated as spam, and in a file deleatur.plu, which
file has to be rated as regular.
Documentation of the Parameter File /*********************************/
The parameter file will be created automatically by Deleatur and is
named deleatur.prm. The file can be named anyhow if you pass it as
a parameter when starting deleatur.exe. If you specify a parameter
beginning in a minus sign this readme file will be displayed. (On
AIX the file deleatur.readme is searched for.)
If somebody wants to tune something using an editor - which is not
necessary in most cases - here is the documentation. All entries
must start in column 1 (otherwise they will be regarded as
comments) and there must be no blanks around the equal sign:
s= This is the name of the POP3 server (e. g. pop.uni-muenster.de).
After a colon you can add a port number.
k=
p= These are user ID (k) and password (p) for the POP server; the
password is encrypted.
u=
o= This is the lower (u) and upper limit (o) of the word base (in
KB). If the upper limit is reached Deleatur asks for removal of
words from the word base until the lower limit is reached.
First, words will be deleted which have been encountered only
once - first again the oldest ones. Implicitly the limits are
o=5000 and u=3000. Don't mix up u and o! Additionally all words
are deleted which existed only in one mail.
l=
a= These are percentage values for deletion limit (l) and
acceptance limit (a). These values are computed automatically
from the size if the word base if not specified (see above),
maximal l=80 and a=10.
v= This is the directory for personal files as there are:
- deleatur.bas (word base for rating the spam)
- deleatur.alt (old word base from the latest run)
- deleatur.srv (list of undeleted mail so far)
- deleatur.txt (plain text list of multilingual text)
- deleatur.spr (coded text of the chosen language)
as well as the protocol files YYYYMMDD.log and so on, and the
index file deleatur.log. Implicitly this is the current
directory.
x=SpamAssassin
Here you can specify whether header lines added by the rule
based spam filter SpamAssassin should be taken into account. If
a mail is rated as spam by SpamAssassin Deleatur will not accept
it without inquiry; if rated as regular by SpamAssassin Deleatur
will not delete it without inquiry. This setting gives more
safety but leads usually to superflous inquiries. When x has
been specified there will appear two numbers in the rating line
added by SpamAssassin: The first is the spam rating value, the
second is the threshold value of SpamAssassin. If nothing was
delivered you will see [].
z= Here you can specify the number of lines for the command window.
After displaying these Deleatur will stop to allow checking
the output. Implicitly the program asks after 24 lines whether
to continue. Daring users may set z to 1000 and a to 0. Then
Deleatur can sweep through and there is a guarantee that regular
mail will not be rated as spam. "z=0" means that lines are not
to be counted. If you enter the letter x before pressing the
enter key Deleatur quits reading mail and proceeds to rating.
This ist helpful after return from holidays.
d=0
This is only for testing and guarantees that no mail is deleted.
r=0
In this way you can prevent any inquiries. Mail having a
spam probability above the deletion limit (l=) is deleted, all
other mail will not be rated and kept.
e=min
You specify the length of the interval Deleatur should wait
until the next looking for Mail. Deleatur runs until you
cancel it (for example by pressing Ctrl-c).
b= Here you can specify the number of days the protocol file
YYYYMMDD.log should be kept. The default value is 1.
m=mail, m=spam, m=ok
This parameter turns Deleatur into learning mode. Mail files are
read from a folder, not from a mail server. If the name of the
folder is "mail" the user has to rate the mail files in the usual
way. If the name is "spam" all mail will be regarded as spam
automatically. If the name is "ok" all mail is rated as
regular mail. Only one folder can be specified. Whereas Deleatur
normally reads the first 50 lines after the header lines, in
learning mode all lines are read.
f=0
This specification may be find your interest, if you call
Deleatur several times in a script: The concluding question
"Close window?" will not be asked.
h=1
If you want to do any manipulations on the protocol file you
can get all header lines of each mail in the protocol.
c=iso
Normally the text of the command window is shown using codepage
850 (both in OS/2 and in Windows). If you prefer an ISO
codepage instead, this setting will help you to readable text
(first of all if you speak a European language).
w=0
This is another specification (besides from f=0), which is
suitable for scripts. It prevents the question "Continue?" if no
mail is found on the POP server.
i=1
This specification also is intended for scripts (besides from f
and w). It's effect is that a beep is sound if mail is found.
n=1
This specification can be used in conjunction with "r=0"
(automatic deletion of mails above the deletion limit). Then
all rating relevant lines of each mail not deleted are stored
in the folder "mail", from where a separate rating can be done
(in conjunction with "m=mail").
q=1
This specification is for users who like to leave their mail on
the server. It prevents the display of mail which has not been
rated again.
y= You can enter several y-lines. After y= you can specify a
keyword of the mail header. The so identified lines will be
included when rating words for spam.
g=1
The somewhat strange effect of this parameter is implemented
for users of erroneous POP servers: Only the right 8 characters
are used as UID.
If you invoke Deleatur from within a script you will perhaps
make use of the following event codes:
0 - there is no mail on the server
1 - there was mail rated as Spam
2 - there was mail rated as regular
4 - there was mail left unrated
8 - there was mail but it was not looked at
16 - there was mail deleted as spam
32 - (reserved)
64 - language file has different version number
128 - Deleatur is already running
256 - connection to POP server failed
512 - authentication failed
1024 - internal error in Deleatur
After execution of Deleatur the sum of the conditions occurred
are available as the return code.
Documentation of the Text File /**************************************/
Every time Deleatur doesn't find the file deleatur.spr it searches
for the file deleatur.txt. After that it asks the user which
language is to be used of the ones contained in this file.
The structure of the text file will be explained following. Lines
beginning in a blank are comments and may be interspersed freely.
The first line must contain the version number of Deleatur.
After the version line there are as many lines as there are
languages. These lines are presented to the user at the first
program start for choosing the dialog language.
Following are text blocks each starting with a line containing
three number signs. The text in this line is irrelevant but should
not be altered. After this line there are as many lines as there
are languages, beginning with the language number followed by the
text in the particular language.
The order of the text blocks ist relevant and must not be changed.
If you want, for instance, introduce French as another language
the beginning of the text file may look like the following (without
regard to the fact that this text block doesn't exist :-):
3.2.2
1 deutsch
2 English
3 francais
### satz_ich_nicht_franz
1 Ich spreche nicht französisch!
2 I don't speak french!
3 Je ne parle pas francais!
Example Scenarios /***************************************************/
Scenario 1: one mail address /*--------------------------------------*/
This is the normal case. The parameter file deleatur.prm will be
generated automatically and needn't be altered.
Scenario 2: several mail addresses /*--------------------------------*/
If you have several mail addresses Deleatur may be invoked several
times in a script, each time using a different parameter file:
deleatur parm1
deleatur parm2
You can specify either always the same word base in the parameter
file or a different one. If different languages are expected
different word bases lead to better recognition results. In the
latter case you should use the "v="-parameter.
Scenario 3: Using a server /*----------------------------------------*/
You can place the program file deleatur.exe and the language files
deleatur.txt and deleatur.spr on a file server. Thus users can
always access the most current version of Deleatur, only providing
space on their computer for the personal files. The "v=" entry
in the parameter file shows the path to the word base (which is,
of course, not in common to all users!).
Scenario 4: automatically during holidays /*-------------------------*/
If you are anxious that your mailbox will overrun during your
holidays you can run Deleatur permanently and let it look for mail
by specifying for instance "m=86400" (that means "daily"). You
should also specify "r=0" and "z=1000000" in the parameter file.
The effect is that Deleatur will delete every mail the spam
probability of which is greater than the deletion limit (l=) and
assigns the status "not regarded" to all others - whithout
including their ratings into the word base. After return from the
holidays you remove "r" and "e" and rate the remaining mail in the
usual way.
Scenario 5: AIX version for all operating systems /*-----------------*/
If you have - as a university, for instance - a central AIX server
you can run Deleatur there, in a telnet or ssh session. Thus also
MAC and Linux users get the benefit of Deleatur (as long as there
is no PL/I compiler for these operating systems :-).
Scenario 6: Learning by hand /*--------------------------------------*/
If you have gathered a folder full of mail of pre-Deleatur times
you can trigger Deleatur via the parameter "m=mail" to read mail
not from a server but from the folder "mail". You have to reply
to inquiries of the program as usual.
Scenario 7: Learning automatically /*--------------------------------*/
If you have got already folders containing only spam or only
regular mail you can specify folder "spam" and folder "ok" behind
the parameter "m=" in two program runs. Deleatur then will learn
without inquiries.
Notice /**************************************************************/
This program has been written by me privately and has been tested
in my institute. I want to thank my collegues in the Center for
information processing (ZIV) of the university of Münster for their
help and especially my boss Dr. Held that he has encouraged me to
publish Deleatur (after testing it himself, of course).
This program can freely be used by everyone. If you want to give a
donation to me out of gratitude you can transfer an amount of your
choice to my account
3 200 372 966 at Postbank Germany, BLZ 20 11 00 22.
(I've heard people would deliberately pay for good software. How
much do you pay for a dinner? :-)
If you want an account with IBAN and BIC, here is one:
IBAN: DE43 2542 0800 9185 0673 18
BIC: BHWKDE21
Deleatur is available not only for Windows but also for OS/2
(eComStation resp.) and AIX. All versions share the same source
code. Deleatur is written in the programming language PL/I: "buffer
overflows" are therefore not to be expected. :-)
All Rights belong to the author. Usage of Deleatur is at your own
risk, in no case I can be made liable for anything!
Please, send suggestions and errors found to
deleatur@eberhard-sturm.de
Extensions of the language file are most welcome, too, and would be
included in the next version of Deleatur.
/*---------------------------------------------------------------*/
/* Eberhard Sturm Tel: +49-251-83-31679 */
/* ZIV (Universitaetsrechenzentrum) Fax: +49-251-83-31555 */
/* Roentgenstr. 9-13 */
/* D-48149 Muenster e-mail: sturm@uni-muenster.de */
/* http://www.uni-muenster.de/ZIV/Mitarbeiter/EberhardSturm.html */
/*---------------------------------------------------------------*/