Spam URI Data Service Contains PERL and tcsh scripts to get SpamCop's "spamvertised sites" URI data from spams that have been reported. Should run in any reasonably current Linux, BSD or UN*X environment. To install: 1. Untar the files into a web directory. 2. Compile epoch_utc.c into epoch_seconds_utc . 3. Make sure perl5 and the perl lwp-request package are installed. 4. Change full paths to system programs to match those on yours. 5. Change any references to /web/antispam to your directory path 6. Run each program in the "go" file manually to see that they work. 7. Install the crontab entries into your crontab. Packing list and descriptions: go - called by crontab to run the every-minute programs grab-spamcop - gets spamcop URI data via web and stores into flat database fqdn-patterns.sed - sed expressions to remove stuff around FQDNs in URIs domains-illegal.sed - sed expressions to remove non-syntactic domain characters hyperlink-post.sed - sed expressions to remove stuff after spamcop hyperlinks parse-domains - appends flat database into directory tree of inverted domains groom-domains - merges new data in changed directory tree, expires old records expire-records - filter to expire old records, age in seconds optional argument process-stats - create statistics of top spam sites top-sites-to-html - convert top sites into an html document top-sites-to-domains - convert top sites into a list of domains whitelist-domains - list of domains to exclude from rbl blacklist-domains - list of domains to include in rbl, use with caution two-level-tlds - list of country code-style two-level tlds to exclude from rbl domains-to-bind - create a bind-style rbl zone file of top spam domains domains-to-rbldnsd - create an rnldnsd-style rbl zone file of top spam domains surbl.nameservers - list of authoritative name servers for rbl zone prune-domains - call find to remove anything old in the domains directory tree epoch_utc.c - program to give and convert time since epoch crontab - calls go script and prunes old parts of directory tree search-uri.cgi - web cgi to search for matching URIs, takes fixed strings search-fqdn.cgi - web cgi to search for matching FQDNs count-uri.cgi - web cgi to give a count of matching URIs count-fqdn.cgi - web cgi to give a count of matching FQDNs percent-uri.cgi - web cgi to give percentage of matching URIs percent-fqdn.cgi - web cgi to give percentage of matching FQDNs index.html - overview and discussion of project README - this file robots.txt - to prevent the domain tree being crawled by search engines To install the RBL you will need to delegate a subdomain for it, update the zone file creation script slightly to use your domain data, link the surbl.bind file into your DNS directory structure, and set up a root cron job to reload the name server fairly often. (Generally speaking it should not be necessary for others to actually set up the RBL. Simply use ours at sc.surbl.org .) -- Jeff Chan Version 1.10 4/5/04