diff options
Diffstat (limited to 'src/mailman/docs/architecture.rst')
| -rw-r--r-- | src/mailman/docs/architecture.rst | 188 |
1 files changed, 188 insertions, 0 deletions
diff --git a/src/mailman/docs/architecture.rst b/src/mailman/docs/architecture.rst new file mode 100644 index 000000000..e1d5c4ec5 --- /dev/null +++ b/src/mailman/docs/architecture.rst @@ -0,0 +1,188 @@ +============================= + Mailman 3 Core architecture +============================= + +This is a brief overview of the internal architecture of the Mailman 3 core +delivery engine. You should start here if you want to understand how Mailman +works at the 1000 foot level. Another good source of architectural +information is available in the chapter written by Barry Warsaw for the +`Architecture of Open Source Applications`_. + + +User model +========== + +Every major component of the system is defined by an interface. Look through +``src/mailman/interfaces`` for an understanding of the system components. +Mailman objects which are stored in the database, are defined by *model* +classes. Objects such as *mailing lists*, *users*, *members*, and *addresses* +are primary objects within the system. + +The *mailing list* is the central object which holds all the configuration +settings for a particular mailing list. A mailing list is associated with a +*domain*, and all mailing lists are managed (i.e. created, destroyed, looked +up) via the *mailing list manager*. + +*Users* represent people, and have a *user id* and a *display name*. Users +are linked to *addresses* which represent a single email address. One user +can be linked to many addresses, but an address is only linked to one user. +Addresses can be *verified* or *not verified*. Mailman will deliver email +only to *verified* addresses. + +Users and addresses are managed by the *user manager*. + +A *member* is created by linking a *subscriber* to a mailing list. +Subscribers can be: + +* A user, which become members through their *preferred address*. +* An address, which can be linked or unlinked to a user, but must be verified. + +Members also have a *role*, representing regular members, digest members, list +owners, and list moderators. Members can even have the *non-member* role +(i.e. people not yet subscribed to the mailing list) for various moderation +purposes. + + +Process model +============= + +Messages move around inside the Mailman system by way of *queue* directories +managed by the *switchboard*. For example, when a message is first received +by Mailman, it is moved to the *in* (for "incoming") queue. During the +processing of this message, it -or copies of it- may be moved to other queues +such as the *out* queue (for outgoing email), the *archive* queue (for sending +to the archivers), the *digest* queue (for composing digests), etc. + +A message in a queue is represented by a single file, a ``.pck`` file. This +file contains two objects, serialized as `Python pickles`_. The first object +is the message being processed, already parsed into a `more efficient internal +representation`_. The second object is a metadata dictionary that records +additional information about the message as it is being processed. + +``.pck`` files only exist for messages moving between different system queues. +There is no ``.pck`` file for messages while they are actively being +processed. + +Each queue directory is associated with a *runner* process which wakes up +every so often. When the runner wakes up, it examines all the ``.pck`` files +in FIFO order, deserializing the message and metadata objects, and processing +them. If the message needs further processing in a different queue, it will +be re-serialized back into a ``.pck`` file. If not (e.g. because processing +of the message is complete), then no ``.pck`` file is written. + +The Mailman system uses a few other runners which don't process messages in a +queue. You can think of these as fairly typical server process, and examples +include the LMTP server, and the HTTP server for processing REST commands. + +All of the runners are managed by a *master watcher* process. When you type +``mailman start`` you are actually starting the master. Based on +configuration options, the master will start the appropriate runners as +subprocesses, and it will watch for the clean exiting of these subprocesses +when ``mailman stop`` is called. + + +Rules and chains +================ + +When a message is first received for posting to a mailing list, Mailman +processes the message to determine whether the message is appropriate for the +mailing list. If so, it *accepts* the message and it gets posted. Mailman +can *discard* the message so that no further processing occurs. Mailman can +also *reject* the message, bouncing it back to the original sender, usually +with some indication of why the message was rejected. Or, Mailman can *hold* +the message for moderator approval. + +*Moderation* is the phase of processing that determines which of the above +four dispositions will occur for the newly posted message. Moderation does +not generally change the message, but it may record information in the +metadata dictionary. Moderation is performed by the *in* queue runner. + +Each step in the moderation phase applies a *rule* to the message and asks +whether the rule *hits* or *misses*. Each rule is linked to an *action* which +is taken if the rule hits (i.e. matches). If the rule misses (i.e. doesn't +match), then the next rule is tried. All of the rule/action links are strung +together sequentially into a *chain*, and every mailing list has a *start +chain* where rule processing begins. + +Actually, every mailing list has *two* start chains, one for regular postings +to the mailing list, and another for posting to the owners of the mailing +list. + +To recap: when a message comes into Mailman for posting to a mailing list, the +incoming runner finds the destination mailing list, determines whether the +message is for the entire list membership, or the list owners, and retrieves +the appropriate start chain. The message is then passed to the chain, where +each link in the chain first checks to see if its rule matches, and if so, it +executes the linked action. This action is usually one of *accept*, *reject*, +*discard*, and *hold*, but other actions are possible, such as executing a +function, deferring action, or jumping to another chain. + +As you might imagine, you can write new rules, compose them into new chains, +and configure a mailing list to use your custom chain when processing the +message during the moderation phase. + + +Pipeline of handlers +==================== + +Once a message is accepted for posting to the mailing list, the message is +usually modified in a number of different ways. For example, some message +headers may be added or removed, some MIME parts might be scrubbed, added, or +rearranged, and various informative headers and footers may be added to the +message. + +The process of preparing the message for the list membership (as well as the +digests, archivers, and NNTP) falls to the *pipeline of handlers* managed by +the *pipeline* queue. + +The pipeline of handlers is similar to the processing chain, except here, a +handler can make any modifications to the message it wants, and there is no +rule decision or action. The message and metadata simply flow through a +sequence of handlers arranged in a named pipeline. Some of the handlers +modify the message in ways described above, and others copy the message to the +outgoing, NNTP, archiver, or digester queues. + +As with chains, each mailing list has two pipelines, one for posting to the +list membership, and the other for posting to the list's owners. + +Of course, you can define new handlers, compose them into new pipelines, and +change a mailing list's pipelines. + + +Integration and control +======================= + +Humans and external programs can interact with a running Core system in may +different ways. There's an extensive command line interface that provides +useful options to a system administrator. For external applications such as +the Postorius web user interface, and the HyperKitty archiver, the +`administrative REST API <rest-api>` is the most common way to get information +into and out of the Core. + +**Note**: The REST API is an administrative API and as such it must not be +exposed to the public internet. By default, the REST server only listens on +``localhost``. + +Internally, the Python API is extensive and well-documented. Most objects in +the system are accessed through the `Zope Component Architecture`_ (ZCA). If +your Mailman installation is importable, you can write scripts directly +against the internal public Python API. + + +Other bits and pieces +===================== + +There are lots of other pieces to the Mailman puzzle, such as the set of core +functionality (logging, initialization, event handling, etc.), mailing list +*styles*, the API for integrating external archivers and mail servers. The +database layer is an critical piece, and Mailman has an extensive set of +command line commands, and email commands. + +Almost the entire system is documented in these pages, but it maybe be a bit +of a spelunking effort to find it. Improvements are welcome! + + +.. _`Architecture of Open Source Applications`: http://www.aosabook.org/en/mailman.html +.. _`Python pickles`: http://docs.python.org/2/library/pickle.html +.. _`more efficient internal representation`: https://docs.python.org/3/library/email.html +.. _`Zope Component Architecture`: https://pypi.python.org/pypi/zope.component |
