PROXYC - A POP3 Proxy Server With Built-In Bayes Spam Filtering
===============================================================

Author
------
Mike Fry
email: mikefry@iafrica.com

About
-----
PROXYC is a simple POP3 server that logically sits between your mail
reader and your ISP. Thus, it 'watches' everything that your mail reader
sends to and receives from your ISP.

As each message is retrieved from your ISP, PROXYC feeds them to the Bayes
Spam Filtering package, bogofilter. Version 0.14.5.4, as ported by Yuri
Dario, is included in the ZIP file. bogofilter will add an extra RFC822
header to each message, giving an indication of whether it thinks the email
is spam or not. If it cannot decide, bogofilter will flag the message as
unsure.

PROXYC uses the BOGOD program, that effectively turns bogofilter into a
daemon, thus avoiding any extra overheads caused by repeated loading and
unloading of bogofilter itself.

PROXYC will keep a copy of every message that it has processed, sorted into
the 3 categories assigned by bogofilter. This allows bogofilter to be
'trained' with known spam and ham messages. Training increases bogofilter's
accuracy for future messages. There are REXX scripts that will push the
message copies into bogofilter's database.

Installation
------------

1) Unzip the downloaded file into ANY directory

2) NO modifications of CONFIG.SYS should be necessary in most cases

   However, the software WILL use one of TMP\TEMP\TMPDIR\TEMPDIR if set
   for temporary files. If not set, PROXYC will use its' own working
   directory for temporary files.

3) Edit the STARTPROXY.CMD file and change the ISP name to your own. You
   can also change the port number that PROXYC listens on from 9110 to
   something else.

4) Point your mail-reader at either localhost (127.0.0.1) or the lan0 ip
   address and to port 9110 or whatever port you've set in STARTPROXY.CMD

   Note: PROXYC and your mail reader shouldn't have to reside on the same
   machine!

5) You can create a Program Object in your Startup Folder to have PROXYC
   automatically started at boot time. Just make sure that you use the full
   pathname of STARTPROXY.CMD file for the object, and the installation
   directory of PROXYC for the working directory.

      e.g. Assume PROXYC is installed in D:\TOOLS\PROXYC

      Program Properties
         Program tab
            Path and file name: D:\TOOLS\PROXYC\STARTPROXY.CMD
            Parameters
            Working directory:  D:\TOOLS\PROXYC

That's it!

Operating
---------
1) Make sure that PROXYC is running before using your mail reader to
   contact your ISP.

2) PROXYC runs in an OS/2 VIO window and can be terminated by pressing
   Ctrl-C or Ctrl-Break

3) BOGOPROCESS.CMD
   is a REXX script that takes the message copies in .\Messages\SPAM and
   .\Messages\NONSPAM and passes them through the training scripts
   TRAIN-SPAM.CMD and TRAIN-NO-SPAM.CMD These scripts are loosely based
   on the identically-named scripts that are found in all the bogofilter
   packages.

   DON'T REPLACE THESE SCRIPTS IF YOU UPGRADE THE BOGOFILTER PACKAGE!

   The BOGOPROCESS>CMD script will also create and maintain a message
   corpus in .\Corpus\* These can be kept for archival purposes or some
   other reason. One reason for keeping them could be to recreate the
   wordlist database. Another could be to seed some other spam-filtering
   package e.g. SpamAssassin. Me.. I keep the message corpus in a pair
   of ZIP files. Don't know what I'm going to do with them. but they may
   come in useful :-)

   Note: because of a buglet (see below) in bogofilter, it is not possible
   to run BOGOPROCESS.CMD whilst PROXYC is running. Shut PROXYC down first
   before attempting to run BOGOPROCESS.CMD

4) Manual Filtering
   Those messages that bogofilter is unsure about will be copied into the
   .\Messages\UNSURE directory. As time passes, and bogofilter learns
   more and more about your individual email patterns, the number of UNSURE
   messages should decrease. Whilst bogofilter is learning to crawl, you
   should examine the files in the .\Messages\UNSURE directory with a text
   viewer to decide for yourself whether the message is spam or ham. Then
   manually move spam messages into .\Messages\SPAM and nonspam into
   .\Messages\NONSPAM.

   ALTHOUGH NOT CRITICAL, THIS SHOULD BE DONE BEFORE RUNNING BOGOPROCESS.CMD
   TO ENSURE THAT BOGOFILTER LEARNS THE MAXIMUM EACH DAY.

5) Mail Reader Filtering
   Most mail readers have some some kind of filtering mechanism to allow you
   to organise your email. I can only speak for PMMail/2 which allows me to
   examine most parts of a message and organise my mail based on the message
   contents. For example, I like to organise all mailing list messages into
   separate mailing list folders - one per mailing list. I also like to keep
   family stuff away from business-related stuff.

   To use PMMail/2 with PROXYC, I have set up, at the very top of the list of
   filters, in this order, these Filters

   a) Description: No Bogosity
             Type: Complex
             ICSL: !(h = "X-Bogosity")
             When: Incoming
          Actions: Move Message to Unchecked
        Rationale: Any messages that haven't been checked by bogofilter
                   must be treated as suspicious

   b) Description: bogofilter - SPAM
             Type: Complex
             ICSL: h.X-Bogosity = "Yes,"
             When: Incoming
          Actions: Move Message to Trash
        Rationale: Any messages that bogofilter thinks are spam can be
                   trashed

   c) Description: bogofilter - UNSURE
             Type: Complex
             ICSL: h.X-Bogosity = "Unsure,"
             When: Incoming
          Actions: Move Message to SPAM?
        Rationale: Any messages that bogofilter is unsure about need to
                   be examined further. Since this is OS/2 and PMMail/2
                   isn't Outhouse Express and thus can't hurt me, I can
                   safely open  these messages and look at the full RFC922
                   headers and un-munged text and decide for myself whether
                   they really need to go into the Trash folder. These are
                   also the messages that will be in .\Messages\UNSURE

   d) Description: bogofilter - !NO
             Type: Complex
             ICSL: !(h.X-Bogosity = "No,")
             When: Incoming
          Actions: Move Message to bogofilter
        Rationale: Just a case of belt and braces. Any messages that
                   bogofilter has not correctly flagged with Yes\No\Unsure
                   should mean an error of some sort. So I like to quarantine
                   them. So far, nothing has appeared in this folder!

Restrictions
------------
None known! :-)

The program is multi-threaded and should be able to handle multiple
accounts at the same provider concurrently. I would like feedback
on how it behaves and performs with multiple accounts.

I use an unregistered version of PMMail/2 as my mail reader, and even
though I have several email accounts, I haven't been able to test PROXYC
with multiple concurrent accounts, because PMMail/2 won't let me! One of
these days, I'm going to 'fix' PMMail/2 :-)

Bugs
----
1) bogofilter keeps the wordlist database open and locked whilst it is
   running. This prevents the BOGOPROCESS.CMD script from running while
   PROXYC is running.

2) PROXYC likes to keep a clean house! In particular, it likes to delete
   temporary files when they're no longer needed. bogofilter has an annoying
   habit of keeping the last temporary message file open. This stops the
   file from being deleted. PROXYC does delete this file when it terminates

Upgrades
--------
As and when new versions of bogofilter are released by Yuri Dario onto
Hobbes, it should just be a matter of replacing the bogo*.exe files in the
PROXYC directory with the newer versions.

Fixes to PROXYC and BOGOD will be made as and when I am notified about
problems. If you don't give me enough information to support your complaint
I can't and won't fix your problem.

Constructive criticism and suggestions are always good!
