Vipul's Razor v2 README
Vipul's Razor is a distributed, collaborative, spam detection and filtering network. Through user contribution, Razor establishes a distributed and constantly updating catalogue of spam in propagation that is consulted by email clients to filter out known spam. Detection is done with statistical and randomized signatures that efficiently spot mutating spam content. User input is validated through reputation assignments based on consensus on report and revoke assertions which in turn is used for computing confidence values associated with individual signatures.
Vipul's Razor v2 agent software is available from project's homepage at http://razor.sf.net. Razor Agents are written in Perl and will work on most Unix operating systems and others OSes for which perl is available. Installation and usage instructions can be found in the INSTALL document in the distribution.
Vipul's Razor v2 is almost a complete rewrite of Razor v1. The following is a list of the most significant new features:
1 New Protocol
The Razor v2 protocol has been completely redesigned. The new
protocol is based on exchange of _Structured Information Strings_,
that are similar to URIs and can be parsed with URI decoding
libraries. v2 protocol supports _Pipelining_, which means Razor
Agents can keep a connection open with server to eliminate the
latency introduced by TCP 3-way handshake and 4-way breakdown for
every connection. The new protocol semantics allow seamless
introduction of new signature schemes.
2 Ephemeral Signatures
Ephemeral Signatures are short-lived signatures based on
collaboratively computed random numbers. Ephemeral Signatures select a
section of text from the spam message based on a random number that
changes every so often. This makes the hashing scheme a moving target,
and spammers can't exploit it because they don't know which part of
the message will be hashed after the random number rollover.
3 Preprocessors
Razor v2 supports several preprocessors. Preprocessors alter the the
text of a spam before a hash is computed. This version includes
preprocessors to decode Base64 encoded messages, decode QP encoded
messages and convert HTML to plaintext. Spammers employ several
techniques that hide mutations in various encoding. Preprocessors
defeat such techniques by hashing the content that a recipient
actually sees in his/her mail user agent.
4 Multiple Filteration Engines
Razor v2 supports multiple engines. An engine is logical unit that
encapsulates a particular type of filteration service. Razor v2
currently supports four engines - VR1 which is equivalent to Razor v1,
VR2 that is based on SHA1 signatures of bodytext, VR3 that is based on
Nilsimsa signatures, and VR4 based on Ephemeral hashes. New engines
can be seamlessly plugged into the service as and when required.
5 Complete Backward Compatibility with Razor v1
The VR1 engine is functionally equivalent to the Razor v1 service and
uses the same database. This means users who transition from v1 to v2
will still get the benefit of several million signatures known to the
v1 service.
6 Base64 signature encoding
Signatures are now encoded as base 64 numbers instead of base 16
(hex), reducing traffic that goes over the wire by 33%.
7 Truth Evaluation System (TeS)
Razor v2 has a transparent, back-end component known as TeS. TeS is a
combination of a reputation system and pattern recognition heuristics
that assigns trust to reporters and confidence values (between 0-100)
to every signature. Users can set an acceptable confidence level in
their Razor configuration. The server also publishes a recommended
confidence level. TeS has been designed to eliminate false positives
of legit bulk email that were occasionally generated by bad reports
in Razor v1.
8 Submission of entire spam messages
Razor v2 accepts the entire body text of spam messages not previously
known to the system. This lets Razor v2 compute new Ephemeral
Signatures every n hours as well as seed the database whenever a new
signature scheme and/or preprocessor is introduced. It should be noted
that Razor v2 _does not_ accept contents of legit email during a check
dialogue. Only signatures are sent when checking email.
9 Revocation
Razor v2 allows users to revoke messages that they don't consider to
be spam. Revocation input is fed into TeS, that adjusts the confidence
value of a signature or remove it from the database as necessary.
Revocation is done through a tool called razor-revoke, which is a part
of the new Razor distribution.
10 Reporter Registration
Razor v2 requires reporters to be registered. This lets reporters
build a reputation over time, so their reports and revocations are
weighed according to their reputation value. Report requires users to
authenticate which is done using a CRAM-SHA1 authentication scheme.
11 Content classes
Razor v2 introduces the concept of content classes. A content class is
a set of messages that represents variations on the same content. As
new reports come in, Nomination servers associate them to an existing
content class, if a (close) match is found. Additionally, Razor v2
treats each MIME attachment is a separate content class, so spammers
MIME attachment can be individually tracked (which is very useful in
case of viruses).
$Id: README,v 1.4 2005/06/28 22:19:07 jpr5 Exp $