summaryrefslogtreecommitdiff
path: root/src/mailman/docs/ARCHITECTURE.rst
blob: 3208ec842246ebc797e521a770fab2b747b0cef3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
======================
Mailman 3 architecture
======================

This is a brief overview of the internal architecture of the Mailman 3 core
delivery engine.  You should start here if you want to understand how Mailman
works at the 1000 foot level.  Another good source of architectural
information is available in the chapter written by Barry Warsaw for the
`Architecture of Open Source Applications`_.


User model
==========

Every major component of the system is defined by an interface.  Look through
``src/mailman/interfaces`` for an understanding of the system components.
Mailman objects which are stored in the database, are defined by *model*
classes.  Objects such as *mailing lists*, *users*, *members*, and *addresses*
are primary objects within the system.

The *mailing list* is the central object which holds all the configuration
settings for a particular mailing list.  A mailing list is associated with a
*domain*, and all mailing lists are managed (i.e. created, destroyed, looked
up) via the *mailing list manager*.

*Users* represent people, and have a *user id* and a *display name*.  Users
are linked to *addresses* which represent a single email address.  One user
can be linked to many addresses, but an address is only linked to one user.
Addresses can be *verified* or *not verified*.  Mailman will deliver email
only to *verified* addresses.

Users and addresses are managed by the *user manager*.

A *member* is created by linking a *subscriber* to a mailing list.
Subscribers can be:

* A user, which become members through their *preferred address*.
* An address, which can be linked or unlinked to a user, but must be verified.

Members also have a *role*, representing regular members, digest members, list
owners, and list moderators.  Members can even have the *non-member* role
(i.e. people not yet subscribed to the mailing list) for various moderation
purposes.


Process model
=============

Messages move around inside the Mailman system by way of *queue* directories
managed by the *switchboard*.  For example, when a message is first received
by Mailman, it is moved to the *in* (for "incoming") queue.  During the
processing of this message, it -- or copies of it -- may be moved to other
queues such as the *out* queue (for outgoing email), the *archive* queue (for
sending to the archivers), the *digest* queue (for composing digests), etc.

A message in a queue is represented by a single file, a ``.pck`` file.  This
file contains two objects, serialized as `Python pickles`_.  The first object
is the message being processed, already parsed into a `more efficient internal
representation`_.  The second object is a metadata dictionary that records
additional information about the message as it is being processed.

``.pck`` files only exist for messages moving between different system queues.
There is no ``.pck`` file for messages while they are actively being
processed.

Each queue directory is associated with a *runner* process which wakes up
every so often.  When the runner wakes up, it examines all the ``.pck`` files
in FIFO order, deserializing the message and metadata objects, processing
them.  If the message needs further processing in a different queue, it will
be re-serialized back into a ``.pck`` file.  If not (e.g. because processing
of the message is complete), then no ``.pck`` file is written.

The Mailman system uses a few other runners which don't process messages in a
queue.  You can think of these as fairly typical server process, and examples
include the LMTP server, and the HTTP server for processing REST commands.

All of the runners are managed by a *master watcher* process.  When you type
``mailman start`` you are actually starting the master.  Based on
configuration options, the master will start the appropriate runners as
subprocesses, and it will watch for the clean exiting of these subprocesses
when ``mailman stop`` is called.


Rules and chains
================

When a message is first received for posting to a mailing list, Mailman
processes the message to determine whether the message is appropriate for the
mailing list.  If so, it *accepts* the message and it gets posted.  Mailman
can also *discard* the message so that no further processing occurs.  Mailman
can also *reject* the message, bouncing it back to the original sender,
usually with some indication of why the message was rejected.  Mailman can
also *hold* the message for moderator approval.

*Moderation* is the phase of processing that determines which of the above
four dispositions will occur for the newly posted message.  Moderation does
not generally change the message, but it may record information in the
metadata dictionary.  Moderation is performed by the *in* queue runner.

Each step in the moderation phase applies a *rule* to the message and asks
whether the rule *hits* or *misses*.  Each rule is linked to an *action* which
is taken if the rule hits (i.e. matches).  If the rule misses (i.e. doesn't
match), then the next rule is tried.  All of the rule/action links are strung
together sequentially into a *chain*, and every mailing list has a *start
chain* where rule processing begins.

Actually, every mailing list has *two* start chains, one for regular postings
to the mailing list, and another for posting to the owners of the mailing
list.

To recap: when a message comes into Mailman for posting to a mailing list, the
incoming runner finds the destination mailing list, determines whether the
message is for the entire list membership, or the list owners, and retrieves
the appropriate start chain.  The message is then passed to the chain, where
each link in the chain first checks to see if its rule matches, and if so, it
executes the linked action.  This action is usually one of *accept*, *reject*,
*discard*, and *hold*, but other actions are possible, such as executing a
function or jumping to another chain.

As you might imagine, you can write new rules, compose them into new chains,
and configure a mailing list to use your custom chain when processing the
message during the moderation phase.


Pipeline of handlers
====================

Once a message is accepted for posting to the mailing list, the message is
usually modified in a number of different ways.  For example, some message
headers may be added or removed, some MIME parts might be scrubbed, added, or
rearranged, and various informative headers and footers may be added to the
message.

The process of preparing the message for the list membership (as well as the
digests, archivers, and NNTP) falls to the *pipeline of handlers* managed by
the *pipeline* queue.

The pipeline of handlers is similar to the processing chain, except here, a
handler can make any modifications to the message it wants, and there is no
rule decision or action.  The message and metadata simply flow through a
sequence of handlers arranged in a named pipeline.  Some of the handlers
modify the message in ways described above, and others copy the message to the
outgoing, NNTP, archiver, or digester queues.

As with chains, each mailing list has two pipelines, one for posting to the
list membership, and the other for posting to the list's owners.

Of course, you can define new handlers, compose them into new pipelines, and
change a mailing list's pipelines.


Other bits and pieces
=====================

There are lots of other pieces to the Mailman puzzle, such as the REST API,
the set of core functionality (logging, initialization, event handling, etc.),
mailing list *styles*, the API for integrating external archivers and mail
servers.  The database layer is an critical piece, and Mailman has an
extensive set of command line commands, and email commands.


.. _`Architecture of Open Source Applications`: http://www.aosabook.org/en/mailman.html
.. _`Python pickles`: http://docs.python.org/2/library/pickle.html
.. _`more efficient internal representation`: https://docs.python.org/3/library/email.html