Mailman - An Extensible MLM using Python
	      =========================================

			       Abstract
			       ========

The explosive growth in the Internet community, and the core role that
email plays in it, demands an adaptable Mailing List Management (MLM)
system.  The extent to which MLMs are adaptable is the extent to which
they can accommodate, and even foster, effective new forms of Internet
community organization.  A new MLM, Mailman, is well suited to such
evolution, and one of the contributing factors is its implementation
in Python.

In this paper we will look at various ways that Mailman's versatility
enables extension of it.  We will consider how the system's design,
and features of its implementation language, Python, factor into that
extensibility.

			     Introduction
			     ============

			    What is Mailman?

Mailman is a Mailing List Management system, like Majordomo and
Smartmail, used to manage email redistribution lists.  Mailman gives
each mailing list a Web page, and allows users to subscribe,
unsubscribe, etc. over the Web.  List managers can administer their
lists entirely from the Web.  Mailman also integrates most things
people want to do with mailing lists, including archiving, mail <->
news gateways, and so on.

Mailman was originally developed by John Viega.  Ken Manheimer picked
up the ball to bring Mailman to 1.0.  Currently, Mailman development
is a group effort, led by John Viega, Ken Manheimer and Barry Warsaw.
Mailman has been designated by the Free Software Foundation as the GNU
Mailing List Manager.

See [LO "Mailman"] for more details on the system, and visit the [MD
"Mailman-developers"] mailing list if you're interested in joining the
Mailman development community.


			  Why Extensibility?

From the early days of the ARPAnet to today, email and Mailing List
Management systems have played a crucial role in the formation and
conduct of communities on the Internet.  With the profound dynamicism
of the Internet, the infrastructures by which it organizes are
continually evolving.  Over time, the rapidly increasing scale and the
advent of improved and new strategies for organization of Internet
communities demand continuing development of the mechanisms supporting
them.  New and different approaches may take up some types of the
traffic, as Usenet News has, but email, as a medium, has proven to be
particularly versatile and lasting.  A good MLM will help foster the
evolution of the Internet communities, by growing with them.

Another reason for extensibility's importance in this context has to
do with a core constituency of mailing list users - the mailing list
administrators.  These administrators are typically near enough to the
end-users to get clear impressions of their needs.  Also, they
commonly are savvy enough, technically, to be able to implement
improvements to accommodate those needs - provided the system doesn't
present too high a threshold of comprehension.  Here is a prime
opportunity for exploiting the Bazaar-style of open-software
development ["The Cathedral and The Bazaar"], enabling the managers of
the medium, themselves, to guide its development, enabling results
more quickly and closely tailored to the needs of the user community.

Finally, most aspects of an MLM do not require the kind of speed
optimizations which force change-impeding hardening of system.
Performance critical aspects, like mail delivery to large numbers of
users, is generally the purview of the underlying Mail Transport
Agents (MTAs), not the MLM.  Large capacities can impose some
specialized performance demands on the MLM, of course.  The specialty
of those demands, however, enables isolating the optimizations to
select components, and Python's compiled-language extensibility
enables hardening those specific components as needed, isolating the
rigidity to the particular subsystems that need it.  At this point we
don't see hardening any components in this fashion, but we don't know
what the future (or potential growth of Mailman's use) will bring.

			     Why Python?

Python is particularly well suited to implementing an extensive and
changing system.  Its combination of clean syntax and cogent semantics
aids the programmer, all the more in the process of changing existing
code.  It is dynamic in many respects, enabling interaction with and
programmatic handling of just about everything in the language.  By
satisfying the needs of prototyping and rapid development, as well as
those of general programming, it can be seen to foster "continuous
development", where a system continues to grow and evolve to
accommodate a changing world.

	       A Broad Overview of Mailman's Structure
	       =======================================

The core of the Mailman system is the MailList object, a class
instance which represents individual mailing lists at the site.  The
MailList class is composed by multiple inheritance from a number of
task-oriented component classes, as mixins.  The task oriented
components contain the methods, variable declarations, and
initializations related to the functionality of a particular
subsystem; for example, that of the delivery mechanism or of the
emailed-commands handler.

The code directly in the MailList class is responsible for
coordination of the mixin classes initialization, central
identification of the specific mailing list, creation of new mailing lists,
and management of mailing lists persistent data and locking.  The internal
MailList object code also handles the very top level of subscriptions
and message posting, but the task-oriented base classes are
responsible for the underpinnings of that and all the other functions
of the mailing list object.  The following base classes currently
exist:

MailCommandHandler:
    This class implements the parsing and execution of Majordomo-style 
    commands embedded in email to -request addresses.  Although users
    more typically interact with mailing lists directly through the
    Web interface, for compatibility, user commands can be issued via
    email.  Where appropriate, the commands have the same syntax and
    semantics as the corresponding Majordomo commands.

HTMLFormatter:
    This class is used to generate list-specific HTML for presentation
    via the World Wide Web interface.  Primarily, it uses a widget
    library also included in Mailman.  Together this class and library
    serves a purpose very similar to that of Robin Friedrich's HTMLgen
    [RF "HTMLgen"] and Digital Creations, L.C. [DC "DocumentFormatter"].

Deliverer:
    This class conducts delivery of any of the email associated with a
    mailing list.  This includes membership delivery of postings,
    subscription acknowledgments, announcements to the list
    administrator about list creation, list business pending approval,
    subscriber notices regarding their passwords, and myriad other
    things.  Email is used for a lot of things by a mailing list
    system, even one with a comprehensive Web interface

ListAdmin:
    This class manages the queuing and notification of mailing list
    submissions - postings and subscriptions requests - that require
    administrator decision (approval or rejection).  For example, a
    list may be set to require administrator approval for any
    postings, or a posting may be held due to triggering a filter
    intended to catch undesired commercial messages (can you say
    spam?).

Archiver:
    This class handles the archival of posted messages.  Mailman
    mailing lists can have public or private archives, and this class
    places the posted message in the appropriate location.  It also
    also interfaces with external Hypertext archivers such as Andrew
    Kuchling's Pipermail [AMK Pipermail], which is bundled with
    Mailman.

Digester:
    Mailing list members can receive posting immediately, or they can
    opt to have cumulative "digests" of the list traffic sent to them
    periodically.  This class manages accumulation of the digests,
    formulation of the plain and MIME formats (when there are
    subscribers to the respective types), and dispatching of the
    digests to the respective subscribers.

SecurityManager:
    This class primarily verifies authorization passwords for the site 
    administrator, list administrators, and users.  It also performs
    the task of sanitizing the Majordomo-style approval passwords from 
    the headers of administrator approvals submitted via email.

Bouncer:
    Mailman catches email delivery bounce notices, and accumulates
    tallies of bounce scores for the mailing list members.  For scores
    that exceed designated thresholds within designated timeout
    conditions, the bouncer triggers list-prescribed actions,
    including disabling of mail delivery or, if set by the list
    administrator, unsubscription of the member from the list.

GatewayManager:
    This class handles optional email-to-Usenet gateways for mailing
    lists.

	      A Selective Tour of Mailman's Versatility
	      =========================================

	  Programming and Interacting With MailList Objects

[[XXX: what???!! Perhaps the most factor contributing to Mailman's
versatility is from designing the MailList class for instantiation by
external programs, or interactively within the interpreters.]]

Almost all aspects of
Mailman mailing list operation are articulated via the MailList instance.
Thus, interaction with mailing lists can be conducted programmatically,
and also incrementally, using the interactive Python shell.

Programmatic interaction enables us to extend access to any aspect of
MailList operation, anywhere we can write a script.  From this we
build Web, Email, and cron access.  We can also build scripts to
automate any routine procedures, such as conversion of subscriber
lists from established Majordomo mailing lists.

Interactive sessions with MailList instances provides an eminently
useful development and debugging tool.  With it, we are able to
exercise and test isolated subsystems and the behavior of the MailList
as a whole, engaging tools like the Python debugger and profiler along
the way.

We can also use interactive sessions to do mailing list "surgery" - to
make changes to list state not provided for in already created
scripts.  Using a utility function, Utils.map_maillists(), we can
apply arbitrary functions to all or to selected Mailman mailing lists
at the site.  This enables us to do wholesale conversions of the
MailLists to accommodate, for instance, changes in the address of the
site, or to search for particular members of any of the mailing lists
and then do some processing on their subscriptions.

	       MailList Object Composed via Inheritance
		    from Task-Oriented Components

Composing the MailList class using multiple inheritance makes it easy
to share the component class methods and data throughout the MailList
object.  It avoids the need to explicitly identify and pass around
delegate instances in order to use those component's data and methods.
Having all the methods and data inhabit the namespace of the primary
MailList instance can lead to inadvertent name collisions.  However,
we feel that the system would have to get much bigger before that
would become a practical concern - and at that point we could use
naming conventions to prevent the collisions, while still enjoying the
easy sharing.  Use of multiple inheritance provides this direct
sharing, along with organization of the system into distinct,
conceptually motivated modules, easing debugging and development.

New major modules are still being added as task-specific mixin
classes, and the process is exceptionally simple.  For instance, as of
this writing one of the primary authors added bidirectional mail/news
gateway capability to Mailman.  This module required knowledge of some
boilerplate structure, and only minor changes to existing modules,
providing a major functionality with almost plugin-style ease.

		  MailList Object State Persistence
	      Exploits Introspection And Simple Sharing

This direct sharing also simplifies the MailList object's persistence
mechanism.  By identifying its own data members via self.__dict__, the
MailList object's persistence mechanism saves and restores MailList
state using a marshal.  (Members that should not be saved are
distinguished with a leading "_" underscore.)  This exploits Python's
introspection capabilities, as well as a standard, simple persistent
storage facility.  (The higher level standard persistent storage
mechanism, pickle, would do more work than we want or need, so we were
able to avoid its overhead.)  As with sharing in the first place, the
arrangement is uncomplicated, easing approach and acquaintance by
newcomers.

Logging Mechanism

Most of the common interactions with MailList objects are triggered
remotely - via the Web or email - or from periodically firing cron
jobs.  The lack of an operator or a console can make system failures
in these contexts hard to trace.  Of course, every program should be
perfect (:-), or at least fail gracefully.  However, when programming
in an environment where change is frequent, we need to provide some
defensive mechanisms which aid the capture of the errors that
inevitably slip by.  Mailman's logging mechanism provides that
coverage.

Reliable logging is also key for tracking the occurrence of common
events that otherwise take place "behind the scenes".  This can
include mailing list subscription activity, automated change of
subscriptions due to delivery failures, and so forth.  It also is
useful to be able to use "flag-printing" debugging, even when stdout
does not go anywhere useful - e.g., when running under CGI, or in
disconnected forked processes, or via email.

The crux of the Mailman logging scheme is a Logger class, whose job is
to reliably direct messages to log files.  Logger instances obey the
conventional Python file-like object interface protocol.  Thus, they
can be explicitly used by the programmer like standard file object to
write messages.

Logger objects can also be substituted for standard output streams
like sys.stderr and sys.stdout, enabling, for instance, blanket
capture of error tracebacks from within the modules where they occur.
Time-stamped logger objects and multi-stream output variants are
commonly used within Mailman scripts that run disconnected from a
terminal, to capture errors.

Loggers are applied in Mailman Web-associated components with another
useful refinement.  All Web CGI scripts are launched via a driver
script.  The driver script launches the intended, job-specific scripts
within the context of an unqualified try-except statement.  If any
exception escapes the job-specific script - including ones that simply
cannot be caught within a script, for instance, syntax errors - then
the driver catches the exception and handles them in a useful way.
The driver produces the traceback and a listing of all the HTTP
environment variable settings both to stdout (HTML formatted, for
rendition on the Web), and to the error log file.  This way, the Web
visitor is provided with informative feedback (including instructions
about contacting the site administrator, if they are inclined), and
the site has a detailed record of the error.  (See [Figure, "Excerpt
from CGI Driver Script Code"], showing the use of error loggers and
the comprehensive exception guard.)

(The driver script, itself, is small and carefully hardened, in order
to minimize the chance that it will introduce errors where they won't
be caught.)

[Figure: Excerpt from CGI Driver Script Code.

    <code>
	try:
	    logger = StampedLogger('error',
				   label='admin',
				   manual_reprime=1,
				   nofail=0)
	    multi = MultiLogger(sys.__stdout__, logger)
	    scriptname = sys.argv[1]
	    pkg = __import__('Mailman.Cgi', globals(), locals(), [scriptname])
	    module = getattr(pkg, scriptname)
	    main = getattr(module, 'main')
	    try:
		main()
	    except SystemExit:
		# this is a valid way for the function to exit
		pass
	except:
	    print_traceback(logger, multi)
	    print_environment(logger)
    </code>
]

Structural integration of error logging within the Mailman framework
eliminates the need for every CGI or mail handling script to do
explicitly take care of logging, and it increases the detection and
pinpointing of faults early in the development cycle.  This
incorporation depends on Python's high-level exception mechanism,
polymorphism, and a standard file-object protocols for a thorough,
no-hassle implementation.

			    Web Interface

Mailman provides an interface to MailList objects via CGI, extending
programmatic access to the World Wide Web.  The MailList base class,
HTMLFormatter, contains MailList-specific HTML widgets, built upon an
HTML widget library which is also part of Mailman.  The underlying
library provides a full range of modest HTML document presentation and
CGI form widgets, as well as cookie handling for authorization.
Together with complete access to Mailman mailing lists via the MailList
object, this general mechanism enables publishing access to any aspect
of MailList operation to the Web.

On this we build typical Web-related functionality, such as an
overview of the mailing lists on the site, and review and subscription
to particular lists, available via the Web.  (See [Figure: "Mailing
List Home Page"].)  In addition, we also extend administrative
customization of MailList operation (see the Configuration Options
section, below), administrative action on the disposition of
subscriptions and postings being held for approval, and subscriber
control of their subscription status, customization options, and
password, among other things.

The elaborateness of Web applications, and the typical lack of a local
operator and error console, can complicate development and debugging
of them.  The use of Mailman's logging utilities, as described above,
provides reporting of unexpected errors, and also provides convenient
means for debugging flag "printouts" when exercising Mailman's Web
interfaces via the Web.

[Figure: "Mailing List Home Page".

 [user-ui.jpg]

]

Configuration Options Mechanism Exploits Namespace Dynamicism

One significant subsystem demonstrating the power of the interface
between MailList objects and the Web is the mailing list
customization-options mechanism.  (See [Figure: "Admin Options
page"].)

MailList configuration options are expressed as simple data structures
(tuples) specifying the name of the MailList's data member which
contains the underlying setting, the type and layout of the HTML user
interface element for the option, a brief description, and an optional
elaborate description.  These options are collected into lists
according to rough categories, e.g. list-privacy specific options, or
digest specific settings.  (The option lists also include string
entries which are used to annotate their presentation, typically at
least including a header describing the category of the set.)

These option descriptors dictate the contents of Web pages by which
the mailing list administrators customize the behaviors of their mailing lists
- coupling the CGI widgets on the pages with the underlying settings
in the MailList objects.  Python's dynamic namespaces and high-level
data structures, among other things, enables this simple mechanism to
couple user interface with the underlying data members.


[Figure: "Admin Options Page".

 [admin-ui.jpg]

]


The elementary nature of the mechanism, in turn, simplifies the
process of adding new configuration variables or changing existing
ones - a common occurrence when new features are added or existing
ones are changed.

The early formal structuring of the options has provided another
benefit - it enables central enhancement of the options mechanism as a
whole.  One recent example is addition of a help mechanism, which
entailed adding the optional slot for elaborate descriptions and a
corresponding addition to the presentation mechanism to offer help for
those variables that contain the elaborate description.

These option description tables could and should be divided into
plugin directories, to further separate the introduction of new
options from the main body of the program, enabling two benefits:

 - Isolation of the program from disruption due to faults in the
   option descriptions (which tend to be changed more commonly than
   other parts of the program).

 - Reevaluation of the option descriptions while the program is
   running (which will be particularly useful when the program is able 
   to run as a persistent daemon)

Languages lacking the ability to directly access and effect runtime
namespaces could not do any of this without significant and cumbersome
indirection, hence obtuseness of the necessary code.

		Drawbacks, Lessons and Open Questions
		=====================================

We discussed a small sample of some key Mailman features exhibiting
the versatility of the design and implementation.  Below we discuss
some inherent drawbacks, and also some lessons learned and open
questions we're still pondering.

The MailList object use of mixins has the drawback that it gathers all
method and data member names in the same namespace.  The requires
defensive programming to avoid collisions.  In practice it is hardly a
problem, except...

The Mailman configuration options compound this danger by directly
populating the list object with numerous data members representing the
options values.  We should reduce this load by encapsulating the
options within a class object tailored to getting and setting the the
options as attributes.  This would also afford additional
functionality on options, such as better defaulting relationships - so
that changes to the central defaults are propagated back to MailLists
even after their creation time.

Early versions of Mailman used broad, unqualified except clauses,
masking unintended exceptions and making it extremely difficult to
track down the origin of faults contained therein.  In practice,
unqualified except clauses should never be used unless the intention
is to catch and actually handle any contained failures.  (Code that
does general failure handling can be seen as an executive of the code
being handled.  For instance, the CGI driver script, which directs the
traceback and debugging info to the appropriate destinations, plays
this role w.r.t. the CGI scripts.)  In general, except clauses should
be as completely qualified as possible, and should be moved as close
to the exception they're meant to catch as can be handled.

[[XXX is this a better way to say the following?]]

One fundamental question involves the friction between rapid
prototyping and system hardening, which is particularly related to the
use of dynamically typed Python.  Dynamic typing is a wonderful tool
for prototyping and rapid changing of the system.  However, as parts
of the system evolve and stabilize, it might be more useful to employ
static typing so that the interfaces between system components, and
the components themselves, can be hardened.  There have been some
discussions on the Python newsgroup about adding optional static
typing to the language, and this would be a very interesting feature
to experiment with.  The questions are how useful would optional
static typing be, and would it be flexible enough to allow migration
of code from dynamic typing to static typing?

			      Conclusion
			      ==========

The desirability of continuing evolution in an MLM suggests a model of
the system as being perpetually unfinished - with at least some parts
at any one time being continuously developing prototypes.  Prototyping
and rapid development are among Python's clear strengths, and
invaluable in this regard.

Mailman exploits many of Python's features, including native object
orientation, multiple inheritance, polymorphism, high-level control
structures like exceptions, conventional protocols, dynamic access to
namespaces, cogent data structures, and a wealth of standard
libraries.  The power of the language, combined with it's tendency to
readability, enables development of sophisticated systems with
approachable, untortured code.  This has already paid off, both in
easy integration of valuable new subsystems, and in accumulation of
contributed code from the user community, even this early in Mailman's
development.

======================================================================
References

AMK "Pipermail", Andrew Kuchling
http://starship.skyport.net/crew/amk/maintained/pipermail.html

DC "Document Template", Digital Creations, L.C.
http://www.digicool.com/releases/bobo/DocumentTemplate-rn.html

ESR "The Cathedral and The Bazaar", Eric Raymond
http://sagan.earthspace.net/esr/writings/cathedral-bazaar/cathedral-bazaar.html

LO "Mailman", www.list.org
http://www.list.org/

MD "Mailman-developers", mailman-developers@python.org

RF "HTMLgen", Robin Friedrich
http://www.python.org/sigs/web-sig/HTMLgen/html/main.html