+----------+ | Deleatur | Version 3.2.3 (November 2004) +----------+ Deleatur© is a program for recognition of spam mail merely based on the counted frequency of words (Bayes filter). The name comes from Latin and means "it is to be deleted". It is pronounced like the English construct "daylayartour" - with the stress on "ar". Deleatur is (according to user opinions) the first spam filter program which really eases work for a user: While other spam filters load down spam mail completely and move spam to a special folder, Deleatur deletes spam directly on the POP server after having received only a part of it. Deleatur should be used for looking for mail. Only if regular mail is left you start your own mail program. Deleatur runs in a command window. Ideally - even when 50 mails are to be processed - you only need to press the enter key a few times. In automatic mode Deleatur looks for mail in determined intervals and deletes spam mail automatically. Deleatur learns from Mail read in by remembering for all words found how often they occur in regular mail and how often in spam mail. Deleatur can also be trained using mail files if available. Also the word base available with the program leads to good results if you receive English spam mail on the one hand and German as well as English regular Mail on the other hand. Deleatur offers the opportunity to carry on the dialog in a language of your choice provided that you have modified a text file accordingly. At this time German, English, Italian (thanks to Roberto Sozzani!) and Portuguese (thanks to Carlos Caramori!) are supplied. Innovations /*********************************************************/ Deleatur 3.2.2: The parameter specifications c=iso, f=0, h=1, i=1, n=1, q=1, and w=0 have been added. Statistics are presented at the end regarding read and deleted mail. One error has been eliminated which inhibited deletion of mail on a certain POP server. Now Quicksort is used during reduction of the word base. Following newer tactics of spammers (sending many words) the default assumptions for automatic rating have been increased. One error in the reduction of the word base has been corrected. Deleatur 3.2.3: It's possible now to run Deleatur permanently (and deleting spam automatically). In the case of many mails you can interrupt your rating and continue at a later date. The algorithm has been enhanced to enable faster changing of ratings. Minus and plus files have been introduced similarly to bonus and malus. You can now add a portname after the server name. Besides German and English, Italian and Portuguese dialogs have been added. The parameter specification g=1 helps in case of a faulty POP server. The parameter y allows inclusion of further header lines in the rating process. (Please, read again parameters e, g, s, y, and z.) The Windows version has one known error: After about 55 Mails some empty areas of the window are painted in wrong color for obscure reasons. The Algorithm /*******************************************************/ There are two articles of mine in the journal of the "Zentrum für Informationsverarbeitung" (Center for Information Processing, i. e. computing center) of the University of Münster: http://www.uni-muenster.de/ZIV/inforum/2003-1/a08.html http://www.uni-muenster.de/ZIV/inforum/2003-2/a02.html The following articles inspired me to program Deleatur: - http://www.paulgraham.com/spam.html - http://radio.weblogs.com/1010454/stories/2002/09/16/spamDetection.html The program remembers for every word of the first 50 lines of each mail (plus a few header lines) how often the word has been found in spam and how often in regular mail. While the word base increases the program decides more and more automatically, whether a mail should be considered spam or not and doesn't ask any more. At a size of the word base of 3 MB all mail will be accepted below a probability of 10 % without inquiry and mail above 80 % will be discarded without inquiry. It may be better not to use the word base file which has been supplied in this package but to build it yourself whithin two weeks or so: Spam mail is always similar but everybody gets different regular mail. You can find the words relevant for the decision in the protocol file. There is a protocol file YYYYMMDD.log for every day where YYYY denotes the year, MM the month, and DD the current day. On every call to Deleatur more lines will be appended at the end of the file. By default the protocol file of the day before today will be kept, older ones will be deleted. If you find your mail marked by the rule based spam filter SpamAssassin you can influence the decisions of Deleatur in that way that no mail is automatically deleted or accepted if SpamAssassin's opinion is different from Deleatur's one. You will, however, find that you only have more to do by that. :-) I think indeed that somebody who sends me a mail containing words which otherwise only exist in spam should not be surprised if that mail is deleted by a spam filter. Nevertheless Deleatur has the facility to specify in a bonus file and in a malus file how to prevent automatic deleting and accepting of mail. This is only if you couldn't sleep at night otherwise. Really important messages are sent on paper! :-) Installation and Handling /*******************************************/ There is no real installation procedure. Unzip the file deleatur.zip (if you haven't done it already) in a folder of your choice - ready! After starting deleatur.exe the first time (for example by double clicking) you have to decide which language you want to use: 1 means German, 2 enables an Englisch dialog. After that you will be asked which POP server you want to use and what are user ID and password. The program stores this information in the parameter file deleatur.prm, the password is stored encrypted. There are the following replies to inquiries of the program: +: Mail should be accepted. -: Mail should be considered Spam. =: Leave the word base untouched, don't rate the mail. ?: Show the first 50 lines of the mail (after header lines). Use the enter key directly to accept the program's suggestion. In the end there will always come the question whether deletion should really take place. Up to this time there has no mail actually been deleted (this is also true when pressing Ctrl and c simultaneously). In detail there are the following responses to the concluding inquiry of the program: Enter key: This is the normal way to end Deleatur. Ratings are stored, spam mail is deleted (if existent). a: All ratings are saved to the word base file but no mail is deleted. The program ends. b: The word base should be unchanged, no mail is to be deleted. The program ends. mail number(s): If you enter one or more numbers in arbitrary order (delimited by one or more blanks) these mails will be presented again for decision. You can specify in the files deleatur.bon and deleatur.mal, respectively, which text in which header lines of a mail leads to a bonus or malus given to the mail in question. A bonus will prevent a mail from beeing deleted automatically, and a malus will prevent a mail from beeing accepted automatically. If it so happens the program will ask for a decision. If you don't want to throw away mail coming from a certain professor you can specify in the bonus file: from:higgins So if "higgins" comes anywhere in the "from" header line the mail gets a bonus. The same is possible for all other header lines, for example subject or recipient (to:): subject:***spam*** to:my_newsgroup Please, avoid blanks before and after then colon. If a word of the bonus or the malus file resp. is found the textor resp. appears in the rating line. In a similar war you can specify in a file deleatur.min, which mail has to be rated as spam, and in a file deleatur.plu, which file has to be rated as regular. Documentation of the Parameter File /*********************************/ The parameter file will be created automatically by Deleatur and is named deleatur.prm. The file can be named anyhow if you pass it as a parameter when starting deleatur.exe. If you specify a parameter beginning in a minus sign this readme file will be displayed. (On AIX the file deleatur.readme is searched for.) If somebody wants to tune something using an editor - which is not necessary in most cases - here is the documentation. All entries must start in column 1 (otherwise they will be regarded as comments) and there must be no blanks around the equal sign: s= This is the name of the POP3 server (e. g. pop.uni-muenster.de). After a colon you can add a port number. k= p= These are user ID (k) and password (p) for the POP server; the password is encrypted. u= o= This is the lower (u) and upper limit (o) of the word base (in KB). If the upper limit is reached Deleatur asks for removal of words from the word base until the lower limit is reached. First, words will be deleted which have been encountered only once - first again the oldest ones. Implicitly the limits are o=5000 and u=3000. Don't mix up u and o! Additionally all words are deleted which existed only in one mail. l= a= These are percentage values for deletion limit (l) and acceptance limit (a). These values are computed automatically from the size if the word base if not specified (see above), maximal l=80 and a=10. v= This is the directory for personal files as there are: - deleatur.bas (word base for rating the spam) - deleatur.alt (old word base from the latest run) - deleatur.srv (list of undeleted mail so far) - deleatur.txt (plain text list of multilingual text) - deleatur.spr (coded text of the chosen language) as well as the protocol files YYYYMMDD.log and so on, and the index file deleatur.log. Implicitly this is the current directory. x=SpamAssassin Here you can specify whether header lines added by the rule based spam filter SpamAssassin should be taken into account. If a mail is rated as spam by SpamAssassin Deleatur will not accept it without inquiry; if rated as regular by SpamAssassin Deleatur will not delete it without inquiry. This setting gives more safety but leads usually to superflous inquiries. When x has been specified there will appear two numbers in the rating line added by SpamAssassin: The first is the spam rating value, the second is the threshold value of SpamAssassin. If nothing was delivered you will see []. z= Here you can specify the number of lines for the command window. After displaying these Deleatur will stop to allow checking the output. Implicitly the program asks after 24 lines whether to continue. Daring users may set z to 1000 and a to 0. Then Deleatur can sweep through and there is a guarantee that regular mail will not be rated as spam. "z=0" means that lines are not to be counted. If you enter the letter x before pressing the enter key Deleatur quits reading mail and proceeds to rating. This ist helpful after return from holidays. d=0 This is only for testing and guarantees that no mail is deleted. r=0 In this way you can prevent any inquiries. Mail having a spam probability above the deletion limit (l=) is deleted, all other mail will not be rated and kept. e=min You specify the length of the interval Deleatur should wait until the next looking for Mail. Deleatur runs until you cancel it (for example by pressing Ctrl-c). b= Here you can specify the number of days the protocol file YYYYMMDD.log should be kept. The default value is 1. m=mail, m=spam, m=ok This parameter turns Deleatur into learning mode. Mail files are read from a folder, not from a mail server. If the name of the folder is "mail" the user has to rate the mail files in the usual way. If the name is "spam" all mail will be regarded as spam automatically. If the name is "ok" all mail is rated as regular mail. Only one folder can be specified. Whereas Deleatur normally reads the first 50 lines after the header lines, in learning mode all lines are read. f=0 This specification may be find your interest, if you call Deleatur several times in a script: The concluding question "Close window?" will not be asked. h=1 If you want to do any manipulations on the protocol file you can get all header lines of each mail in the protocol. c=iso Normally the text of the command window is shown using codepage 850 (both in OS/2 and in Windows). If you prefer an ISO codepage instead, this setting will help you to readable text (first of all if you speak a European language). w=0 This is another specification (besides from f=0), which is suitable for scripts. It prevents the question "Continue?" if no mail is found on the POP server. i=1 This specification also is intended for scripts (besides from f and w). It's effect is that a beep is sound if mail is found. n=1 This specification can be used in conjunction with "r=0" (automatic deletion of mails above the deletion limit). Then all rating relevant lines of each mail not deleted are stored in the folder "mail", from where a separate rating can be done (in conjunction with "m=mail"). q=1 This specification is for users who like to leave their mail on the server. It prevents the display of mail which has not been rated again. y= You can enter several y-lines. After y= you can specify a keyword of the mail header. The so identified lines will be included when rating words for spam. g=1 The somewhat strange effect of this parameter is implemented for users of erroneous POP servers: Only the right 8 characters are used as UID. If you invoke Deleatur from within a script you will perhaps make use of the following event codes: 0 - there is no mail on the server 1 - there was mail rated as Spam 2 - there was mail rated as regular 4 - there was mail left unrated 8 - there was mail but it was not looked at 16 - there was mail deleted as spam 32 - (reserved) 64 - language file has different version number 128 - Deleatur is already running 256 - connection to POP server failed 512 - authentication failed 1024 - internal error in Deleatur After execution of Deleatur the sum of the conditions occurred are available as the return code. Documentation of the Text File /**************************************/ Every time Deleatur doesn't find the file deleatur.spr it searches for the file deleatur.txt. After that it asks the user which language is to be used of the ones contained in this file. The structure of the text file will be explained following. Lines beginning in a blank are comments and may be interspersed freely. The first line must contain the version number of Deleatur. After the version line there are as many lines as there are languages. These lines are presented to the user at the first program start for choosing the dialog language. Following are text blocks each starting with a line containing three number signs. The text in this line is irrelevant but should not be altered. After this line there are as many lines as there are languages, beginning with the language number followed by the text in the particular language. The order of the text blocks ist relevant and must not be changed. If you want, for instance, introduce French as another language the beginning of the text file may look like the following (without regard to the fact that this text block doesn't exist :-): 3.2.2 1 deutsch 2 English 3 francais ### satz_ich_nicht_franz 1 Ich spreche nicht französisch! 2 I don't speak french! 3 Je ne parle pas francais! Example Scenarios /***************************************************/ Scenario 1: one mail address /*--------------------------------------*/ This is the normal case. The parameter file deleatur.prm will be generated automatically and needn't be altered. Scenario 2: several mail addresses /*--------------------------------*/ If you have several mail addresses Deleatur may be invoked several times in a script, each time using a different parameter file: deleatur parm1 deleatur parm2 You can specify either always the same word base in the parameter file or a different one. If different languages are expected different word bases lead to better recognition results. In the latter case you should use the "v="-parameter. Scenario 3: Using a server /*----------------------------------------*/ You can place the program file deleatur.exe and the language files deleatur.txt and deleatur.spr on a file server. Thus users can always access the most current version of Deleatur, only providing space on their computer for the personal files. The "v=" entry in the parameter file shows the path to the word base (which is, of course, not in common to all users!). Scenario 4: automatically during holidays /*-------------------------*/ If you are anxious that your mailbox will overrun during your holidays you can run Deleatur permanently and let it look for mail by specifying for instance "m=86400" (that means "daily"). You should also specify "r=0" and "z=1000000" in the parameter file. The effect is that Deleatur will delete every mail the spam probability of which is greater than the deletion limit (l=) and assigns the status "not regarded" to all others - whithout including their ratings into the word base. After return from the holidays you remove "r" and "e" and rate the remaining mail in the usual way. Scenario 5: AIX version for all operating systems /*-----------------*/ If you have - as a university, for instance - a central AIX server you can run Deleatur there, in a telnet or ssh session. Thus also MAC and Linux users get the benefit of Deleatur (as long as there is no PL/I compiler for these operating systems :-). Scenario 6: Learning by hand /*--------------------------------------*/ If you have gathered a folder full of mail of pre-Deleatur times you can trigger Deleatur via the parameter "m=mail" to read mail not from a server but from the folder "mail". You have to reply to inquiries of the program as usual. Scenario 7: Learning automatically /*--------------------------------*/ If you have got already folders containing only spam or only regular mail you can specify folder "spam" and folder "ok" behind the parameter "m=" in two program runs. Deleatur then will learn without inquiries. Notice /**************************************************************/ This program has been written by me privately and has been tested in my institute. I want to thank my collegues in the Center for information processing (ZIV) of the university of Münster for their help and especially my boss Dr. Held that he has encouraged me to publish Deleatur (after testing it himself, of course). This program can freely be used by everyone. If you want to give a donation to me out of gratitude you can transfer an amount of your choice to my account 3 200 372 966 at Postbank Germany, BLZ 20 11 00 22. (I've heard people would deliberately pay for good software. How much do you pay for a dinner? :-) If you want an account with IBAN and BIC, here is one: IBAN: DE43 2542 0800 9185 0673 18 BIC: BHWKDE21 Deleatur is available not only for Windows but also for OS/2 (eComStation resp.) and AIX. All versions share the same source code. Deleatur is written in the programming language PL/I: "buffer overflows" are therefore not to be expected. :-) All Rights belong to the author. Usage of Deleatur is at your own risk, in no case I can be made liable for anything! Please, send suggestions and errors found to deleatur@eberhard-sturm.de Extensions of the language file are most welcome, too, and would be included in the next version of Deleatur. /*---------------------------------------------------------------*/ /* Eberhard Sturm Tel: +49-251-83-31679 */ /* ZIV (Universitaetsrechenzentrum) Fax: +49-251-83-31555 */ /* Roentgenstr. 9-13 */ /* D-48149 Muenster e-mail: sturm@uni-muenster.de */ /* http://www.uni-muenster.de/ZIV/Mitarbeiter/EberhardSturm.html */ /*---------------------------------------------------------------*/