+----------+
 | Deleatur | Version 3.2.3 (November 2004)
 +----------+

 Deleatur© is a program for recognition of spam mail merely based on
 the counted frequency of words (Bayes filter). The name comes from
 Latin and means "it is to be deleted". It is pronounced like the
 English construct "daylayartour" - with the stress on "ar".

 Deleatur is (according to user opinions) the first spam filter
 program which really eases work for a user: While other spam
 filters load down spam mail completely and move spam to a special
 folder, Deleatur deletes spam directly on the POP server after
 having received only a part of it.

 Deleatur should be used for looking for mail. Only if regular mail
 is left you start your own mail program. Deleatur runs in a
 command window. Ideally - even when 50 mails are to be processed -
 you only need to press the enter key a few times. In
 automatic mode Deleatur looks for mail in determined
 intervals and deletes spam mail automatically.

 Deleatur learns from Mail read in by remembering for all words
 found how often they occur in regular mail and how often in spam
 mail. Deleatur can also be trained using mail files if available.
 Also the word base available with the program leads to good results
 if you receive English spam mail on the one hand and German as well
 as English regular Mail on the other hand.

 Deleatur offers the opportunity to carry on the dialog in a
 language of your choice provided that you have modified a text file
 accordingly. At this time German, English, Italian (thanks to
 Roberto Sozzani!) and Portuguese (thanks to Carlos Caramori!) are
 supplied.


 Innovations /*********************************************************/

 Deleatur 3.2.2: The parameter specifications c=iso, f=0, h=1, i=1,
                 n=1, q=1, and w=0 have been added. Statistics are
                 presented at the end regarding read and deleted
                 mail. One error has been eliminated which inhibited
                 deletion of mail on a certain POP server. Now
                 Quicksort is used during reduction of the word
                 base. Following newer tactics of spammers (sending
                 many words) the default assumptions for automatic
                 rating have been increased. One error in the
                 reduction of the word base has been corrected.
 Deleatur 3.2.3: It's possible now to run Deleatur permanently
                 (and deleting spam automatically). In the case of
                 many mails you can interrupt your rating and
                 continue at a later date. The algorithm has been
                 enhanced to enable faster changing of ratings.
                 Minus and plus files have been introduced
                 similarly to bonus and malus. You can now add
                 a portname after the server name. Besides German
                 and English, Italian and Portuguese dialogs have
                 been added. The parameter specification g=1 helps
                 in case of a faulty POP server. The parameter y
                 allows inclusion of further header lines in the
                 rating process. (Please, read again parameters e,
                 g, s, y, and z.)

 The Windows version has one known error: After about 55 Mails some
 empty areas of the window are painted in wrong color for obscure
 reasons.


 The Algorithm /*******************************************************/

 There are two articles of mine in the journal of the
 "Zentrum für Informationsverarbeitung" (Center for Information
 Processing, i. e. computing center) of the University of Münster:

    http://www.uni-muenster.de/ZIV/inforum/2003-1/a08.html
    http://www.uni-muenster.de/ZIV/inforum/2003-2/a02.html

 The following articles inspired me to program Deleatur:

 - http://www.paulgraham.com/spam.html
 - http://radio.weblogs.com/1010454/stories/2002/09/16/spamDetection.html

 The program remembers for every word of the first 50 lines of each
 mail (plus a few header lines) how often the word has been found
 in spam and how often in regular mail.

 While the word base increases the program decides more and more
 automatically, whether a mail should be considered spam or not and
 doesn't ask any more. At a size of the word base of 3 MB all
 mail will be accepted below a probability of 10 % without inquiry
 and mail above 80 % will be discarded without inquiry.

 It may be better not to use the word base file which has been
 supplied in this package but to build it yourself whithin two weeks
 or so: Spam mail is always similar but everybody gets different
 regular mail.

 You can find the words relevant for the decision in the protocol
 file. There is a protocol file YYYYMMDD.log for every day where
 YYYY denotes the year, MM the month, and DD the current day. On
 every call to Deleatur more lines will be appended at the end of
 the file. By default the protocol file of the day before today will
 be kept, older ones will be deleted.

 If you find your mail marked by the rule based spam filter
 SpamAssassin you can influence the decisions of Deleatur in that
 way that no mail is automatically deleted or accepted if
 SpamAssassin's opinion is different from Deleatur's one. You
 will, however, find that you only have more to do by that. :-)

 I think indeed that somebody who sends me a mail containing words
 which otherwise only exist in spam should not be surprised if
 that mail is deleted by a spam filter. Nevertheless Deleatur
 has the facility to specify in a bonus file and in a malus file
 how to prevent automatic deleting and accepting of mail. This is
 only if you couldn't sleep at night otherwise. Really important
 messages are sent on paper! :-)


 Installation and Handling /*******************************************/

 There is no real installation procedure. Unzip the file
 deleatur.zip (if you haven't done it already) in a folder of your
 choice - ready!

 After starting deleatur.exe the first time (for example by
 double clicking) you have to decide which language you want to
 use: 1 means German, 2 enables an Englisch dialog.
 After that you will be asked which POP server you want to use and
 what are user ID and password. The program stores this information
 in the parameter file deleatur.prm, the password is stored
 encrypted.

 There are the following replies to inquiries of the program:

    +: Mail should be accepted.
    -: Mail should be considered Spam.
    =: Leave the word base untouched, don't rate the mail.
    ?: Show the first 50 lines of the mail (after header lines).

 Use the enter key directly to accept the program's suggestion.

 In the end there will always come the question whether deletion
 should really take place. Up to this time there has no mail
 actually been deleted (this is also true when pressing Ctrl and
 c simultaneously). In detail there are the following responses to
 the concluding inquiry of the program:

    Enter key:
       This is the normal way to end Deleatur. Ratings are stored,
       spam mail is deleted (if existent).

    a: All ratings are saved to the word base file but no mail is
       deleted. The program ends.

    b: The word base should be unchanged, no mail is to be deleted.
       The program ends.

    mail number(s):
       If you enter one or more numbers in arbitrary order (delimited
       by one or more blanks) these mails will be presented again
       for decision.

 You can specify in the files deleatur.bon and deleatur.mal,
 respectively, which text in which header lines of a mail leads to a
 bonus or malus given to the mail in question. A bonus will prevent
 a mail from beeing deleted automatically, and a malus will prevent a
 mail from beeing accepted automatically. If it so happens the
 program will ask for a decision.

 If you don't want to throw away mail coming from a certain
 professor you can specify in the bonus file:

 from:higgins

 So if "higgins" comes anywhere in the "from" header line
 the mail gets a bonus. The same is possible for all other header
 lines, for example subject or recipient (to:):

 subject:***spam***
 to:my_newsgroup

 Please, avoid blanks before and after then colon. If a word of the
 bonus or the malus file resp. is found the text  or
  resp. appears in the rating line.

 In a similar war you can specify in a file deleatur.min, which
 mail has to be rated as spam, and in a file deleatur.plu, which
 file has to be rated as regular.


 Documentation of the Parameter File /*********************************/

 The parameter file will be created automatically by Deleatur and is
 named deleatur.prm. The file can be named anyhow if you pass it as
 a parameter when starting deleatur.exe. If you specify a parameter
 beginning in a minus sign this readme file will be displayed. (On
 AIX the file deleatur.readme is searched for.)

 If somebody wants to tune something using an editor - which is not
 necessary in most cases - here is the documentation. All entries
 must start in column 1 (otherwise they will be regarded as
 comments) and there must be no blanks around the equal sign:

 s= This is the name of the POP3 server (e. g. pop.uni-muenster.de).
    After a colon you can add a port number.

 k=
 p= These are user ID (k) and password (p) for the POP server; the
    password is encrypted.

 u=
 o= This is the lower (u) and upper limit (o) of the word base (in
    KB). If the upper limit is reached Deleatur asks for removal of
    words from the word base until the lower limit is reached.
    First, words will be deleted which have been encountered only
    once - first again the oldest ones. Implicitly the limits are
    o=5000 and u=3000. Don't mix up u and o! Additionally all words
    are deleted which existed only in one mail.

 l=
 a= These are percentage values for deletion limit (l) and
    acceptance limit (a). These values are computed automatically
    from the size if the word base if not specified (see above),
    maximal l=80 and a=10.

 v= This is the directory for personal files as there are:
    - deleatur.bas (word base for rating the spam)
    - deleatur.alt (old word base from the latest run)
    - deleatur.srv (list of undeleted mail so far)
    - deleatur.txt (plain text list of multilingual text)
    - deleatur.spr (coded text of the chosen language)
    as well as the protocol files YYYYMMDD.log and so on, and the
    index file deleatur.log. Implicitly this is the current
    directory.

 x=SpamAssassin
    Here you can specify whether header lines added by the rule
    based spam filter SpamAssassin should be taken into account. If
    a mail is rated as spam by SpamAssassin Deleatur will not accept
    it without inquiry; if rated as regular by SpamAssassin Deleatur
    will not delete it without inquiry. This setting gives more
    safety but leads usually to superflous inquiries. When x has
    been specified there will appear two numbers in the rating line
    added by SpamAssassin: The first is the spam rating value, the
    second is the threshold value of SpamAssassin. If nothing was
    delivered you will see [].

 z= Here you can specify the number of lines for the command window.
    After displaying these Deleatur will stop to allow checking
    the output. Implicitly the program asks after 24 lines whether
    to continue. Daring users may set z to 1000 and a to 0. Then
    Deleatur can sweep through and there is a guarantee that regular
    mail will not be rated as spam. "z=0" means that lines are not
    to be counted. If you enter the letter x before pressing the
    enter key Deleatur quits reading mail and proceeds to rating.
    This ist helpful after return from holidays.

 d=0
    This is only for testing and guarantees that no mail is deleted.

 r=0
    In this way you can prevent any inquiries. Mail having a
    spam probability above the deletion limit (l=) is deleted, all
    other mail will not be rated and kept.

 e=min
    You specify the length of the interval Deleatur should wait
    until the next looking for Mail. Deleatur runs until you
    cancel it (for example by pressing Ctrl-c).

 b= Here you can specify the number of days the protocol file
    YYYYMMDD.log should be kept. The default value is 1.

 m=mail, m=spam, m=ok
    This parameter turns Deleatur into learning mode. Mail files are
    read from a folder, not from a mail server. If the name of the
    folder is "mail" the user has to rate the mail files in the usual
    way. If the name is "spam" all mail will be regarded as spam
    automatically. If the name is "ok" all mail is rated as
    regular mail. Only one folder can be specified. Whereas Deleatur
    normally reads the first 50 lines after the header lines, in
    learning mode all lines are read.

 f=0
    This specification may be find your interest, if you call
    Deleatur several times in a script: The concluding question
    "Close window?" will not be asked.

 h=1
    If you want to do any manipulations on the protocol file you
    can get all header lines of each mail in the protocol.

 c=iso
    Normally the text of the command window is shown using codepage
    850 (both in OS/2 and in Windows). If you prefer an ISO
    codepage instead, this setting will help you to readable text
    (first of all if you speak a European language).

 w=0
    This is another specification (besides from f=0), which is
    suitable for scripts. It prevents the question "Continue?" if no
    mail is found on the POP server.

 i=1
    This specification also is intended for scripts (besides from f
    and w). It's effect is that a beep is sound if mail is found.

 n=1
    This specification can be used in conjunction with "r=0"
    (automatic deletion of mails above the deletion limit). Then
    all rating relevant lines of each mail not deleted are stored
    in the folder "mail", from where a separate rating can be done
    (in conjunction with "m=mail").

 q=1
    This specification is for users who like to leave their mail on
    the server. It prevents the display of mail which has not been
    rated again.

 y= You can enter several y-lines. After y= you can specify a
    keyword of the mail header. The so identified lines will be
    included when rating words for spam.

 g=1
    The somewhat strange effect of this parameter is implemented
    for users of erroneous POP servers: Only the right 8 characters
    are used as UID.

 If you invoke Deleatur from within a script you will perhaps
 make use of the following event codes:

      0 - there is no mail on the server
      1 - there was mail rated as Spam
      2 - there was mail rated as regular
      4 - there was mail left unrated
      8 - there was mail but it was not looked at
     16 - there was mail deleted as spam
     32 - (reserved)
     64 - language file has different version number
    128 - Deleatur is already running
    256 - connection to POP server failed
    512 - authentication failed
   1024 - internal error in Deleatur

 After execution of Deleatur the sum of the conditions occurred
 are available as the return code.


 Documentation of the Text File /**************************************/

 Every time Deleatur doesn't find the file deleatur.spr it searches
 for the file deleatur.txt. After that it asks the user which
 language is to be used of the ones contained in this file.
 The structure of the text file will be explained following. Lines
 beginning in a blank are comments and may be interspersed freely.
 The first line must contain the version number of Deleatur.

 After the version line there are as many lines as there are
 languages. These lines are presented to the user at the first
 program start for choosing the dialog language.

 Following are text blocks each starting with a line containing
 three number signs. The text in this line is irrelevant but should
 not be altered. After this line there are as many lines as there
 are languages, beginning with the language number followed by the
 text in the particular language.

 The order of the text blocks ist relevant and must not be changed.
 If you want, for instance, introduce French as another language
 the beginning of the text file may look like the following (without
 regard to the fact that this text block doesn't exist :-):

 3.2.2
 1 deutsch
 2 English
 3 francais
 ### satz_ich_nicht_franz
 1 Ich spreche nicht französisch!
 2 I don't speak french!
 3 Je ne parle pas francais!


 Example Scenarios /***************************************************/

 Scenario 1: one mail address /*--------------------------------------*/

 This is the normal case. The parameter file deleatur.prm will be
 generated automatically and needn't be altered.

 Scenario 2: several mail addresses /*--------------------------------*/

 If you have several mail addresses Deleatur may be invoked several
 times in a script, each time using a different parameter file:

 deleatur parm1
 deleatur parm2

 You can specify either always the same word base in the parameter
 file or a different one. If different languages are expected
 different word bases lead to better recognition results. In the
 latter case you should use the "v="-parameter.

 Scenario 3: Using a server /*----------------------------------------*/

 You can place the program file deleatur.exe and the language files
 deleatur.txt and deleatur.spr on a file server. Thus users can
 always access the most current version of Deleatur, only providing
 space on their computer for the personal files. The "v=" entry
 in the parameter file shows the path to the word base (which is,
 of course, not in common to all users!).

 Scenario 4: automatically during holidays /*-------------------------*/

 If you are anxious that your mailbox will overrun during your
 holidays you can run Deleatur permanently and let it look for mail
 by specifying for instance "m=86400" (that means "daily"). You
 should also specify "r=0" and "z=1000000" in the parameter file.
 The effect is that Deleatur will delete every mail the spam
 probability of which is greater than the deletion limit (l=) and
 assigns the status "not regarded" to all others - whithout
 including their ratings into the word base. After return from the
 holidays you remove "r" and "e" and rate the remaining mail in the
 usual way.

 Scenario 5: AIX version for all operating systems /*-----------------*/

 If you have - as a university, for instance - a central AIX server
 you can run Deleatur there, in a telnet or ssh session. Thus also
 MAC and Linux users get the benefit of Deleatur (as long as there
 is no PL/I compiler for these operating systems :-).

 Scenario 6: Learning by hand /*--------------------------------------*/

 If you have gathered a folder full of mail of pre-Deleatur times
 you can trigger Deleatur via the parameter "m=mail" to read mail
 not from a server but from the folder "mail". You have to reply
 to inquiries of the program as usual.

 Scenario 7: Learning automatically /*--------------------------------*/

 If you have got already folders containing only spam or only
 regular mail you can specify folder "spam" and folder "ok" behind
 the parameter "m=" in two program runs. Deleatur then will learn
 without inquiries.


 Notice /**************************************************************/

 This program has been written by me privately and has been tested
 in my institute. I want to thank my collegues in the Center for
 information processing (ZIV) of the university of Münster for their
 help and especially my boss Dr. Held that he has encouraged me to
 publish Deleatur (after testing it himself, of course).

 This program can freely be used by everyone. If you want to give a
 donation to me out of gratitude you can transfer an amount of your
 choice to my account

 3 200 372 966 at Postbank Germany, BLZ 20 11 00 22.

 (I've heard people would deliberately pay for good software. How
 much do you pay for a dinner? :-)

 If you want an account with IBAN and BIC, here is one:

 IBAN: DE43 2542 0800 9185 0673 18
 BIC: BHWKDE21

 Deleatur is available not only for Windows but also for OS/2
 (eComStation resp.) and AIX. All versions share the same source
 code. Deleatur is written in the programming language PL/I: "buffer
 overflows" are therefore not to be expected. :-)

 All Rights belong to the author. Usage of Deleatur is at your own
 risk, in no case I can be made liable for anything!

 Please, send suggestions and errors found to

   deleatur@eberhard-sturm.de

 Extensions of the language file are most welcome, too, and would be
 included in the next version of Deleatur.

      /*---------------------------------------------------------------*/
     /* Eberhard Sturm                          Tel: +49-251-83-31679 */
    /* ZIV (Universitaetsrechenzentrum)        Fax: +49-251-83-31555 */
   /* Roentgenstr. 9-13                                             */
  /* D-48149 Muenster                e-mail: sturm@uni-muenster.de */
 /* http://www.uni-muenster.de/ZIV/Mitarbeiter/EberhardSturm.html */
/*---------------------------------------------------------------*/