Mailman - An Extensible MLM using Python ========================================= Abstract ======== The explosive growth in the Internet community, and the core role that email plays in it, demands an adaptable Mailing List Management (MLM) system. The extent to which MLMs are adaptable is the extent to which they can accommodate, and even foster, effective new forms of Internet community organization. A new MLM, Mailman, is well suited to such evolution, and one of the contributing factors is its implementation in Python. In this paper we will look at various ways that Mailman's versatility enables extension of it. We will consider how the system's design, and features of its implementation language, Python, factor into that extensibility. Introduction ============ What is Mailman? Mailman is a Mailing List Management system, like Majordomo and Smartmail, used to manage email redistribution lists. Mailman gives each mailing list a Web page, and allows users to subscribe, unsubscribe, etc. over the Web. List managers can administer their lists entirely from the Web. Mailman also integrates most things people want to do with mailing lists, including archiving, mail <-> news gateways, and so on. Mailman was originally developed by John Viega. Ken Manheimer picked up the ball to bring Mailman to 1.0. Currently, Mailman development is a group effort, led by John Viega, Ken Manheimer and Barry Warsaw. Mailman has been designated by the Free Software Foundation as the GNU Mailing List Manager. See [LO "Mailman"] for more details on the system, and visit the [MD "Mailman-developers"] mailing list if you're interested in joining the Mailman development community. Why Extensibility? From the early days of the ARPAnet to today, email and Mailing List Management systems have played a crucial role in the formation and conduct of communities on the Internet. With the profound dynamicism of the Internet, the infrastructures by which it organizes are continually evolving. Over time, the rapidly increasing scale and the advent of improved and new strategies for organization of Internet communities demand continuing development of the mechanisms supporting them. New and different approaches may take up some types of the traffic, as Usenet News has, but email, as a medium, has proven to be particularly versatile and lasting. A good MLM will help foster the evolution of the Internet communities, by growing with them. Another reason for extensibility's importance in this context has to do with a core constituency of mailing list users - the mailing list administrators. These administrators are typically near enough to the end-users to get clear impressions of their needs. Also, they commonly are savvy enough, technically, to be able to implement improvements to accommodate those needs - provided the system doesn't present too high a threshold of comprehension. Here is a prime opportunity for exploiting the Bazaar-style of open-software development ["The Cathedral and The Bazaar"], enabling the managers of the medium, themselves, to guide its development, enabling results more quickly and closely tailored to the needs of the user community. Finally, most aspects of an MLM do not require the kind of speed optimizations which force change-impeding hardening of system. Performance critical aspects, like mail delivery to large numbers of users, is generally the purview of the underlying Mail Transport Agents (MTAs), not the MLM. Large capacities can impose some specialized performance demands on the MLM, of course. The specialty of those demands, however, enables isolating the optimizations to select components, and Python's compiled-language extensibility enables hardening those specific components as needed, isolating the rigidity to the particular subsystems that need it. At this point we don't see hardening any components in this fashion, but we don't know what the future (or potential growth of Mailman's use) will bring. Why Python? Python is particularly well suited to implementing an extensive and changing system. Its combination of clean syntax and cogent semantics aids the programmer, all the more in the process of changing existing code. It is dynamic in many respects, enabling interaction with and programmatic handling of just about everything in the language. By satisfying the needs of prototyping and rapid development, as well as those of general programming, it can be seen to foster "continuous development", where a system continues to grow and evolve to accommodate a changing world. A Broad Overview of Mailman's Structure ======================================= The core of the Mailman system is the MailList object, a class instance which represents individual mailing lists at the site. The MailList class is composed by multiple inheritance from a number of task-oriented component classes, as mixins. The task oriented components contain the methods, variable declarations, and initializations related to the functionality of a particular subsystem; for example, that of the delivery mechanism or of the emailed-commands handler. The code directly in the MailList class is responsible for coordination of the mixin classes initialization, central identification of the specific mailing list, creation of new mailing lists, and management of mailing lists persistent data and locking. The internal MailList object code also handles the very top level of subscriptions and message posting, but the task-oriented base classes are responsible for the underpinnings of that and all the other functions of the mailing list object. The following base classes currently exist: MailCommandHandler: This class implements the parsing and execution of Majordomo-style commands embedded in email to -request addresses. Although users more typically interact with mailing lists directly through the Web interface, for compatibility, user commands can be issued via email. Where appropriate, the commands have the same syntax and semantics as the corresponding Majordomo commands. HTMLFormatter: This class is used to generate list-specific HTML for presentation via the World Wide Web interface. Primarily, it uses a widget library also included in Mailman. Together this class and library serves a purpose very similar to that of Robin Friedrich's HTMLgen [RF "HTMLgen"] and Digital Creations, L.C. [DC "DocumentFormatter"]. Deliverer: This class conducts delivery of any of the email associated with a mailing list. This includes membership delivery of postings, subscription acknowledgments, announcements to the list administrator about list creation, list business pending approval, subscriber notices regarding their passwords, and myriad other things. Email is used for a lot of things by a mailing list system, even one with a comprehensive Web interface ListAdmin: This class manages the queuing and notification of mailing list submissions - postings and subscriptions requests - that require administrator decision (approval or rejection). For example, a list may be set to require administrator approval for any postings, or a posting may be held due to triggering a filter intended to catch undesired commercial messages (can you say spam?). Archiver: This class handles the archival of posted messages. Mailman mailing lists can have public or private archives, and this class places the posted message in the appropriate location. It also also interfaces with external Hypertext archivers such as Andrew Kuchling's Pipermail [AMK Pipermail], which is bundled with Mailman. Digester: Mailing list members can receive posting immediately, or they can opt to have cumulative "digests" of the list traffic sent to them periodically. This class manages accumulation of the digests, formulation of the plain and MIME formats (when there are subscribers to the respective types), and dispatching of the digests to the respective subscribers. SecurityManager: This class primarily verifies authorization passwords for the site administrator, list administrators, and users. It also performs the task of sanitizing the Majordomo-style approval passwords from the headers of administrator approvals submitted via email. Bouncer: Mailman catches email delivery bounce notices, and accumulates tallies of bounce scores for the mailing list members. For scores that exceed designated thresholds within designated timeout conditions, the bouncer triggers list-prescribed actions, including disabling of mail delivery or, if set by the list administrator, unsubscription of the member from the list. GatewayManager: This class handles optional email-to-Usenet gateways for mailing lists. A Selective Tour of Mailman's Versatility ========================================= Programming and Interacting With MailList Objects [[XXX: what???!! Perhaps the most factor contributing to Mailman's versatility is from designing the MailList class for instantiation by external programs, or interactively within the interpreters.]] Almost all aspects of Mailman mailing list operation are articulated via the MailList instance. Thus, interaction with mailing lists can be conducted programmatically, and also incrementally, using the interactive Python shell. Programmatic interaction enables us to extend access to any aspect of MailList operation, anywhere we can write a script. From this we build Web, Email, and cron access. We can also build scripts to automate any routine procedures, such as conversion of subscriber lists from established Majordomo mailing lists. Interactive sessions with MailList instances provides an eminently useful development and debugging tool. With it, we are able to exercise and test isolated subsystems and the behavior of the MailList as a whole, engaging tools like the Python debugger and profiler along the way. We can also use interactive sessions to do mailing list "surgery" - to make changes to list state not provided for in already created scripts. Using a utility function, Utils.map_maillists(), we can apply arbitrary functions to all or to selected Mailman mailing lists at the site. This enables us to do wholesale conversions of the MailLists to accommodate, for instance, changes in the address of the site, or to search for particular members of any of the mailing lists and then do some processing on their subscriptions. MailList Object Composed via Inheritance from Task-Oriented Components Composing the MailList class using multiple inheritance makes it easy to share the component class methods and data throughout the MailList object. It avoids the need to explicitly identify and pass around delegate instances in order to use those component's data and methods. Having all the methods and data inhabit the namespace of the primary MailList instance can lead to inadvertent name collisions. However, we feel that the system would have to get much bigger before that would become a practical concern - and at that point we could use naming conventions to prevent the collisions, while still enjoying the easy sharing. Use of multiple inheritance provides this direct sharing, along with organization of the system into distinct, conceptually motivated modules, easing debugging and development. New major modules are still being added as task-specific mixin classes, and the process is exceptionally simple. For instance, as of this writing one of the primary authors added bidirectional mail/news gateway capability to Mailman. This module required knowledge of some boilerplate structure, and only minor changes to existing modules, providing a major functionality with almost plugin-style ease. MailList Object State Persistence Exploits Introspection And Simple Sharing This direct sharing also simplifies the MailList object's persistence mechanism. By identifying its own data members via self.__dict__, the MailList object's persistence mechanism saves and restores MailList state using a marshal. (Members that should not be saved are distinguished with a leading "_" underscore.) This exploits Python's introspection capabilities, as well as a standard, simple persistent storage facility. (The higher level standard persistent storage mechanism, pickle, would do more work than we want or need, so we were able to avoid its overhead.) As with sharing in the first place, the arrangement is uncomplicated, easing approach and acquaintance by newcomers. Logging Mechanism Most of the common interactions with MailList objects are triggered remotely - via the Web or email - or from periodically firing cron jobs. The lack of an operator or a console can make system failures in these contexts hard to trace. Of course, every program should be perfect (:-), or at least fail gracefully. However, when programming in an environment where change is frequent, we need to provide some defensive mechanisms which aid the capture of the errors that inevitably slip by. Mailman's logging mechanism provides that coverage. Reliable logging is also key for tracking the occurrence of common events that otherwise take place "behind the scenes". This can include mailing list subscription activity, automated change of subscriptions due to delivery failures, and so forth. It also is useful to be able to use "flag-printing" debugging, even when stdout does not go anywhere useful - e.g., when running under CGI, or in disconnected forked processes, or via email. The crux of the Mailman logging scheme is a Logger class, whose job is to reliably direct messages to log files. Logger instances obey the conventional Python file-like object interface protocol. Thus, they can be explicitly used by the programmer like standard file object to write messages. Logger objects can also be substituted for standard output streams like sys.stderr and sys.stdout, enabling, for instance, blanket capture of error tracebacks from within the modules where they occur. Time-stamped logger objects and multi-stream output variants are commonly used within Mailman scripts that run disconnected from a terminal, to capture errors. Loggers are applied in Mailman Web-associated components with another useful refinement. All Web CGI scripts are launched via a driver script. The driver script launches the intended, job-specific scripts within the context of an unqualified try-except statement. If any exception escapes the job-specific script - including ones that simply cannot be caught within a script, for instance, syntax errors - then the driver catches the exception and handles them in a useful way. The driver produces the traceback and a listing of all the HTTP environment variable settings both to stdout (HTML formatted, for rendition on the Web), and to the error log file. This way, the Web visitor is provided with informative feedback (including instructions about contacting the site administrator, if they are inclined), and the site has a detailed record of the error. (See [Figure, "Excerpt from CGI Driver Script Code"], showing the use of error loggers and the comprehensive exception guard.) (The driver script, itself, is small and carefully hardened, in order to minimize the chance that it will introduce errors where they won't be caught.) [Figure: Excerpt from CGI Driver Script Code. try: logger = StampedLogger('error', label='admin', manual_reprime=1, nofail=0) multi = MultiLogger(sys.__stdout__, logger) scriptname = sys.argv[1] pkg = __import__('Mailman.Cgi', globals(), locals(), [scriptname]) module = getattr(pkg, scriptname) main = getattr(module, 'main') try: main() except SystemExit: # this is a valid way for the function to exit pass except: print_traceback(logger, multi) print_environment(logger) ] Structural integration of error logging within the Mailman framework eliminates the need for every CGI or mail handling script to do explicitly take care of logging, and it increases the detection and pinpointing of faults early in the development cycle. This incorporation depends on Python's high-level exception mechanism, polymorphism, and a standard file-object protocols for a thorough, no-hassle implementation. Web Interface Mailman provides an interface to MailList objects via CGI, extending programmatic access to the World Wide Web. The MailList base class, HTMLFormatter, contains MailList-specific HTML widgets, built upon an HTML widget library which is also part of Mailman. The underlying library provides a full range of modest HTML document presentation and CGI form widgets, as well as cookie handling for authorization. Together with complete access to Mailman mailing lists via the MailList object, this general mechanism enables publishing access to any aspect of MailList operation to the Web. On this we build typical Web-related functionality, such as an overview of the mailing lists on the site, and review and subscription to particular lists, available via the Web. (See [Figure: "Mailing List Home Page"].) In addition, we also extend administrative customization of MailList operation (see the Configuration Options section, below), administrative action on the disposition of subscriptions and postings being held for approval, and subscriber control of their subscription status, customization options, and password, among other things. The elaborateness of Web applications, and the typical lack of a local operator and error console, can complicate development and debugging of them. The use of Mailman's logging utilities, as described above, provides reporting of unexpected errors, and also provides convenient means for debugging flag "printouts" when exercising Mailman's Web interfaces via the Web. [Figure: "Mailing List Home Page". [user-ui.jpg] ] Configuration Options Mechanism Exploits Namespace Dynamicism One significant subsystem demonstrating the power of the interface between MailList objects and the Web is the mailing list customization-options mechanism. (See [Figure: "Admin Options page"].) MailList configuration options are expressed as simple data structures (tuples) specifying the name of the MailList's data member which contains the underlying setting, the type and layout of the HTML user interface element for the option, a brief description, and an optional elaborate description. These options are collected into lists according to rough categories, e.g. list-privacy specific options, or digest specific settings. (The option lists also include string entries which are used to annotate their presentation, typically at least including a header describing the category of the set.) These option descriptors dictate the contents of Web pages by which the mailing list administrators customize the behaviors of their mailing lists - coupling the CGI widgets on the pages with the underlying settings in the MailList objects. Python's dynamic namespaces and high-level data structures, among other things, enables this simple mechanism to couple user interface with the underlying data members. [Figure: "Admin Options Page". [admin-ui.jpg] ] The elementary nature of the mechanism, in turn, simplifies the process of adding new configuration variables or changing existing ones - a common occurrence when new features are added or existing ones are changed. The early formal structuring of the options has provided another benefit - it enables central enhancement of the options mechanism as a whole. One recent example is addition of a help mechanism, which entailed adding the optional slot for elaborate descriptions and a corresponding addition to the presentation mechanism to offer help for those variables that contain the elaborate description. These option description tables could and should be divided into plugin directories, to further separate the introduction of new options from the main body of the program, enabling two benefits: - Isolation of the program from disruption due to faults in the option descriptions (which tend to be changed more commonly than other parts of the program). - Reevaluation of the option descriptions while the program is running (which will be particularly useful when the program is able to run as a persistent daemon) Languages lacking the ability to directly access and effect runtime namespaces could not do any of this without significant and cumbersome indirection, hence obtuseness of the necessary code. Drawbacks, Lessons and Open Questions ===================================== We discussed a small sample of some key Mailman features exhibiting the versatility of the design and implementation. Below we discuss some inherent drawbacks, and also some lessons learned and open questions we're still pondering. The MailList object use of mixins has the drawback that it gathers all method and data member names in the same namespace. The requires defensive programming to avoid collisions. In practice it is hardly a problem, except... The Mailman configuration options compound this danger by directly populating the list object with numerous data members representing the options values. We should reduce this load by encapsulating the options within a class object tailored to getting and setting the the options as attributes. This would also afford additional functionality on options, such as better defaulting relationships - so that changes to the central defaults are propagated back to MailLists even after their creation time. Early versions of Mailman used broad, unqualified except clauses, masking unintended exceptions and making it extremely difficult to track down the origin of faults contained therein. In practice, unqualified except clauses should never be used unless the intention is to catch and actually handle any contained failures. (Code that does general failure handling can be seen as an executive of the code being handled. For instance, the CGI driver script, which directs the traceback and debugging info to the appropriate destinations, plays this role w.r.t. the CGI scripts.) In general, except clauses should be as completely qualified as possible, and should be moved as close to the exception they're meant to catch as can be handled. [[XXX is this a better way to say the following?]] One fundamental question involves the friction between rapid prototyping and system hardening, which is particularly related to the use of dynamically typed Python. Dynamic typing is a wonderful tool for prototyping and rapid changing of the system. However, as parts of the system evolve and stabilize, it might be more useful to employ static typing so that the interfaces between system components, and the components themselves, can be hardened. There have been some discussions on the Python newsgroup about adding optional static typing to the language, and this would be a very interesting feature to experiment with. The questions are how useful would optional static typing be, and would it be flexible enough to allow migration of code from dynamic typing to static typing? Conclusion ========== The desirability of continuing evolution in an MLM suggests a model of the system as being perpetually unfinished - with at least some parts at any one time being continuously developing prototypes. Prototyping and rapid development are among Python's clear strengths, and invaluable in this regard. Mailman exploits many of Python's features, including native object orientation, multiple inheritance, polymorphism, high-level control structures like exceptions, conventional protocols, dynamic access to namespaces, cogent data structures, and a wealth of standard libraries. The power of the language, combined with it's tendency to readability, enables development of sophisticated systems with approachable, untortured code. This has already paid off, both in easy integration of valuable new subsystems, and in accumulation of contributed code from the user community, even this early in Mailman's development. ====================================================================== References AMK "Pipermail", Andrew Kuchling http://starship.skyport.net/crew/amk/maintained/pipermail.html DC "Document Template", Digital Creations, L.C. http://www.digicool.com/releases/bobo/DocumentTemplate-rn.html ESR "The Cathedral and The Bazaar", Eric Raymond http://sagan.earthspace.net/esr/writings/cathedral-bazaar/cathedral-bazaar.html LO "Mailman", www.list.org http://www.list.org/ MD "Mailman-developers", mailman-developers@python.org RF "HTMLgen", Robin Friedrich http://www.python.org/sigs/web-sig/HTMLgen/html/main.html