Home / software

DSPAM

Posted on 08 March 2008

From: NuclearElephant.com

DSPAM is a scalable and open-source content-based spam filter designed for multi-user enterprise systems. On a properly configured system, many users experience results between 99.5% - 99.95%, or one error for every 200 to 2000 messages. DSPAM supports many different MTAs and can also be deployed as a stand-alone SMTP appliance. For developers, the DSPAM core engine (libdspam) can be easily incorporated directly into applications for drop-in filtering (GPL applies; commercial licenses are also available).

DSPAM has been implemented on many large and small scale systems with the largest being reported at about 350,000 mailboxes. It is presently being used or planned for use in multiple commercial solutions.

DSPAM is an adaptive filter which means it is capable of learning and adapting to each user's email. Instead of working off of a list of "rules" to identify spam, DSPAM's probabilistic engine examines the content of each message and learns what type of content the user deems as spam (or nonspam). This approach to machine-learning provides much higher levels of accuracy than commercial "hodge-podge" solutions, and with minimal resources. DSPAM's best recorded levels of accuracy have included 99.991% by one avid user (2 errors in 22,786) and 99.987% by the author (1 error in 7000), which is ten times more accurate than a human being!

DSPAM's Focus
The DSPAM project attempts to set itself apart from other filters by focusing on the following areas:

  • DSPAM has a strong drive for research. Many new algorithms and approaches to fighting spam have come out of the DSPAM project. Some of the approaches deployed in DSPAM include Concept Identification, Neural Networking, Message Inoculation , advanced de-obfuscation techniques, and a new noise reduction algorithm called Bayesian Noise Reduction. DSPAM also supports many different mathematical paradigms including Bayes, Chi-Square, Geometric, and Markovian Discrimination.

  • A strong focus on large-scale implementation support. The largest implementation of DSPAM we've heard about to-date involves 350,000 users, with the next largest being around 125,000, then 100,000. DSPAM has been designed to run with a very short execution time (between 0.01s - 0.03s real time for classification and between 0.03s - 0.10s real time for training, on average hardware), and has been equipped with a storage driver API allowing several different storage mechanisms to be used.

  • Usability. DSPAM was designed with "grandma" in mind. Users can retrain by either forwarding any spam they receive to a spam address, or use the web UI to quickly mark spam and deliver false positives. DSPAM can also be integrated with IMAP solutions to provide a drag-and-drop spam folder for training. End-users don't need to know any commandline utilities or other complexities plaguing some other such tools. Functions such as whitelisting and keyword inventory are automatic (based on DSPAM's statistical functions) and therefore require no user intervention.


Features
  • System-wide administratively-maintenance free filtering. The DSPAM agent can integrate into just about any network and can even be implemented as an SMTP gateway.

  • A simple-to-use learning mechanism. DSPAM allows users to simply forward their spam to their "spam email address" for learning, eliminating any learning curve necessary to make it usable by your customers. The information used in every calculation is temporarily stored on the server, enabling DSPAM to relearn the original message by looking for a small signature in the forwarded spam. As a result, users don't have to be trained to 'bounce' messages around, and administrators don't have to worry about incompatible mail clients.

  • Support for a variety of storage implementations. DSPAM's storage driver API allows the administrator to choose how they wish to store data. Currently supported drivers include SQLite, Berkeley DB, MySQL, PostgreSQL, Oracle, and a self-contained high-speed hash driver.

  • Written in C for speed, performance, and scalability. Unlike Python or PERL solutions, DSPAM is written in a low-level compiled language, meaning there is very little overhead. DSPAM runs fast, efficient, and doesn't depend on any third-party language interpreters.

  • MTA support. DSPAM works great with Sendmail, Postfix, Qmail, Courier, and Exim, and should work well with many other MTAs. In the event you happen to run something like Exchange, DSPAM can be implemented on your network as an SMTP gateway. Just point your MX at it and configure it to relay to your mail server.

 

TOP