summaryrefslogtreecommitdiff
path: root/Mailman/Handlers/Scrubber.py
Commit message (Collapse)AuthorAgeFilesLines
* Reorganize doctests, moving handler documentation into Mailman/handlers/docs.Barry Warsaw2008-01-241-500/+0
| | | | Rename all handlers to be more PEP 8 friendly, i.e. lowercased.
* Update copyright years. Change a plugin name.Barry Warsaw2008-01-131-1/+1
|
* Add an interface IArchiver which is used to calculate urls and send messagesBarry Warsaw2008-01-131-5/+4
| | | | | | | | | | to the archiver. Also add a plugin architecture for easily overriding the archiver, and hook this into the setup.py script. Updated CookHeaders.py and Scrubber.py handlers to use the plugged archiver. Updated doctests as appropriate. Fix a typo in the setup.py file.
* Convert to the Storm Python ORM <storm.canonical.com>. There were severalBarry Warsaw2007-11-181-4/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | reasons for this, but most importantly, the changes from SQLAlchemy/Elixir 0.3 to 0.4 were substantial and caused a lot of work. This work unfortunately did not result in a working branch due to very strange and inconsistent behavior with Unicode columns. Sometimes such columns would return Unicode, sometimes 8-bit strings, with no rhyme or reason. I gave up debugging this after many hours of head scratching. Oh yeah, no more flush! Storm enforces Unicode columns, which is very nice, though requires us to add lots of 'u's in places we didn't have them before. Ultimately, this is a good thing so that the core of Mailman will be Unicode consistent. One thing I still want to clean up after this, is the function-scoped imports in the model code. Part of the reason for the separate model classes was to avoid this, but for now, we'll live with it. Storm's architecture requires us to maintain a database-table-class registry for simple clearing after tests in Database._reset(). This is made fairly simple by Storm allowing us to use our own metaclass for model classes. Storm does require that we write our own SQL files, which is a downside, but I think our schema will be easy enough that this won't be a huge burden. Plus we have a head-start <wink>. Another cool thing about Storm is the explicit use of stores for objects. This should eventually allow me to flesh out my idea of storage pillars for 1) lists, 2) users, 3) messages. Some other changes: - pylint and pyflakes cleanups - SQLALCHEMY_ENGINE_URL -> DEFAULT_DATABASE_URL - Don't import-* from Version in Defaults.py - Add interface method to Mailman.Message.Message so that __getitem__() and get_all() always return Unicode headers, even when the underlying objects are strings. This should generally be safe as headers are required by RFC to be within the ASCII range. - Fix bin/arch.py to use proper initialization.
| * Initial pylint/pyflakes cleanupBarry Warsaw2007-11-171-4/+0
| |
* | - Scrubber.pyMark Sapiro2007-11-061-5/+6
|/ | | | | | | | Fixed an issue where an implicit text/plain part without any headers gets lost. Moved the cleansing of the filename extension to a place where it is guaranteed to be a string as opposed to an empty list.
* Much progress, though not perfect, on migrating to SQLAlchemy 0.4 and ElixirBarry Warsaw2007-10-311-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 0.4. Lots of things changes, which broke lots of our code. There are still a couple of failures in the test suite that I don't understand. It seems that for pending.txt and requests.txt, sometimes strings come back from the database as 8-bit strings and other times as unicodes. It's impossible to make these tests work both separately and together. users.txt is also failing intermittently. Lots of different behavior between running the full test suite all together and running individual tests. Sigh. Note also that actually, Elixir 0.4.0 doesn't work for us. There's a bug in that version that prevented zope.interfaces and Elixir working together. Get the latest 0.4.0 from source to fix this. Other changes include: - Remove Mailman/lockfile.py. While I haven't totally eliminated locking, I have released the lockfile as a separate Python package called locknix, which Mailman 3.0 now depends on. - Renamed Mailman/interfaces/messagestore.py and added an IMessage interface. - bin/testall raises turns on SQLALCHEMY_ECHO when the verbosity is above 3 (that's three -v's because the default verbosity is 1). - add_domain() in config files now allows url_host to be optional. If not given, it defaults to email_host. - Added a non-public interface IDatabase._reset() used by the test suite to zap the database between doctests. Added an implementation in the model which just runs through all rows in all entities, deleting them. - [I]Pending renamed to [I]Pended - Don't allow Pendings.add() to infloop. - In the model's User impelementations, we don't need to append or remove the address when linking and unlinking. By setting the address.user attribute, SQLAlchemy appears to do the right thing, though I'm not 100% sure of that (see the above mentioned failures).
* General cleanups some of which is even tested <wink>. Mailman.LockFile moduleBarry Warsaw2007-10-101-7/+4
| | | | | | | | | | | | | is moved to Mailman.lockfile. Remove a few more MailList methods that aren't used any more, e.g. the lock related stuff, the Save() and CheckValues() methods, as well as ChangeMemberName(). Add a missing import to lifecycle.py. We no longer need withlist to unlock the mailing list. Also, expose config.db.flush() in the namespace of withlist directly, under 'flush'.
* OMGW00T: After over a decade, the MailList mixin class is gone! Well,Barry Warsaw2007-09-211-2/+4
| | | | | | | | | | | | mostly. It's no longer needed by anything in the test suite, and therefore the list manager returns database MailingList objects directly. The wrapper cruft has been removed. To accomplish this, a couple of hacks were added to the Mailman.app package, which will get cleaned up over time. The MailList module itself (and its few remaining mixins) aren't yet removed from the tree because some of the code is still not tested, and I want to leave this code around until I've finished converting it.
* Move the pending database into the SQLAlchemy/Elixir layer. The oldBarry Warsaw2007-08-011-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pending.py module is removed. Added an interface to this functionality such that any IPendable (essentially a key/value mapping) can be associated with a token, and that token can be confirmed and has a lifetime. Any keys and values can be stored, as long as both are unicodes. Added a doctest. Modified initialization of the database layer to support pluggability via setuptools. No longer is this layer initialized from a module, but now it's instantiated from a class that implements IDatabase. The StockDatabase class implements the SQLAchemy/Elixir layer, but this can be overridden in a setup.py. Bye bye MANAGERS_INIT_FUNCTION, we hardly knew ye. Added a package Mailman.app which will contain certain application specific functionality. Right now, the only there there is an IRegistar implementation, which didn't seem to fit anywhere else. Speaking of which, the IRegistrar interface implements all the logic related to registration and verification of email addresses. Think the equivalent of MailList.AddMember() except generalized out of a mailing list context. This latter will eventually go away. The IRegistrar sends the confirmation email. Added an IDomain interface, though the only implementation of this so far lives in the registration.txt doctest. This defines the context necessary for domain-level things, like address confirmation. A bunch of other cleanups in modules that are necessary due to the refactoring of Pending, but don't affect anything that's actually tested yet, so I won't vouch for them (except that they don't throw errors on import!). Clean up Defaults.py; also turn the functions seconds(), minutes(), hours() and days() into their datetime.timedelta equivalents. Consolidated the bogus email address exceptions. In some places where appropriate, use email 4.0 module names instead of the older brand. Switch from Mailman.Utils.unique_message_id() to email.utils.make_msgid() everywhere. This is because we need to allow sending not in the context of a mailing list (i.e. domain-wide address confirmation message). So we can't use a Message-ID generator that requires a mailing list. OTOH, this breaks Message-ID collision detection in the mail->news gateway. I'll fix that eventually. Remove the 'verified' row on the Address table. Now verification is checked by Address.verified_on not being None.
* Convert the Scrubber test to a doctest, and fix Scrubber.py, but otherwiseBarry Warsaw2007-07-121-13/+14
| | | | | | | | don't modernize the Scrubber handler. The is the last of the handler test conversions until we figure out what to do with the Approve handler. In a unified user database the semantics of this are unclear.
* Scrubber.py - Malformed RFC 2047 encoded filename= parameter can haveMark Sapiro2007-06-221-12/+16
| | | | | | | | a null byte or other garbage in the extension. Cleaned this. - Improved handling of None payloads. - Cleaned up a few charset coercions. OutgoingRunner.py - Made probe bounce processing and queuing of bounces conditional on having some permanent failure(s).
* Merge exp-elixir-branch to trunk. There is enough working to make me feelbwarsaw2007-05-281-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | confident the Elixir branch is ready to become mainline. Also, fewer branches makes for an easier migration to a dvcs. Don't expect much of the old test suite to work, or even for much of the old functionality to work. The changes here are disruptive enough to break higher level parts of Mailman. But that's okay because I am slowly building up a new and improved test suite, which will lead to a functional system again. For now, only the doctests in Mailman/docs (and their related test harnesses) will pass, but they all do pass. Note that Mailman/docs serve as system documentation first and unit tests second. You should be able to read the doctest files to understand the underlying data model. Other changes included in this merge: - Added the Mailman.ext extension package. - zope.interfaces uses to describe major components - SQLAlchemy/Elixir used as the database model - Top level doinstall target renamed to justinstall - 3rd-party packages are now installed in pythonlib/lib/python to be more compliant with distutils standards. This allows us to use just --home instead of all the --install-* options. - No longer need to include the email package or pysqlite, as Python 2.5 is required (and comes with both packages). - munepy package is included, for Python enums - IRosterSets are added as a way to manage a collection of IRosters. Roster sets are named so that we can maintain the indirection between mailing lists and rosters, where the two are maintained in different storages. - IMailingListRosters: remove_*_roster() -> delete_*_roster() - Remove IMember interface. - Utils.list_names() -> config.list_manager.names - fqdn_listname() takes an optional hostname argument. - Added a bunch of new exceptions used throughout the new interfaces. - Make LockFile a context manager for use with the 'with' statement.
* passwords.py: Looks like we still need unicode checking.tkikuchi2007-03-251-33/+41
| | | | | | | Mark Sapiro's patch for 'format' parameter. (Decorate.py, Scrubber.py) Scrubber.py: More brush up of code ... 'Content-Transfer-Encoding' is not updated by msg.set_payload(). 'Url:' to 'URL:' normalization. test_handlers.py: Test codes for Decorate.py and Scrubber.py.
* Clean up file permissions and umask settings. Now we set the umask to 007bwarsaw2007-01-051-26/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | during early initialization so that we're guaranteed to get the right value regardless of the shell umask used to invoke the command line script. While we're at it, we can remove almost all individual umask settings previously in the code, and make file permissions consistently -rw-rw---- (IOW, files are no longer other readable). The only subsystem that wasn't changed was the archiver, because it uses its own umask settings to ensure that private archives have the proper permissions. Eventually we'll mess with this, but if it ain't broken... Note that check_perms complains about directory permissions, but I think check_perms can be fixed (or perhaps, even removed?!). If we decide to use LMTPRunner and HTTPRunner exclusively then no outside process will be touching our files potentially with the incorrect permissions, umask, owner, or group. If we control all of our own touch points then I think we can lock out 'other'. Another open question is whether Utils.set_global_password() can have its umask setting removed. It locks permissions down so even the group can't write to the site password file, but the default umask of 007 might be good enough even for this file. Utils.makedirs() now takes an optional mode argument, which defaults to 02775 for backward compatibility. First, the default mode can probably be changed to 02770 (see above). Second, all code that was tweaking the umask in order to do a platform compatible os.mkdir() has now been refactored to use Utils.makedirs(). Another tricky thing was getting SQLite via SQLAlchemy to create its data/mailman.db file with the proper permissions. From the comment in dbcontext.py: # XXX By design of SQLite, database file creation does not honor # umask. See their ticket #1193: # http://www.sqlite.org/cvstrac/tktview?tn=1193,31 More details in that file, but the work around is to essentially 'touch' the database file if 'sqlite' is the scheme of the SQLAlchemy URL. This little pre-touch sets the right umask honoring permission and won't hurt if the file already exists. SQLite will happily keep the existing permissions, and in fact that ticket referenced above recommends doing things this way. In the Mailman.database.initialize(), create a global lock that prevents more than one process from entering this init function at the same time. It's probably not strictly necessary given that I believe all the operations in dbcontext.connect() are multi-processing safe, but it also doesn't seem to hurt and prevents race conditions regardless of the database's own safeguards (or lack thereof). Make sure nightly_gzip.py calls initialize().
* Fix test_message.py by finishing the wind-through of the configuration objectbwarsaw2006-07-081-6/+6
| | | | | | | | | | | | | | | | and fixing the invocation and shutdown of mailmanctl. While the tests in this module work individually, they do not yet work as a group. -C added to testall.py, and mailmanctl now passes that flag on to qrunner. UserNotification sets reduced_list_header in the msgdata, but the behavior of this flag has changed. It used to suppress List-Help, List-Subscribe, and List-Unsubscribe as well as List-Post and List-Archive. However, List-Help, List-Subscribe and List-Unsubscribe should definitely be included in UserNotifications, and List-Post has a different variable controlling it now. Therefore, always add List-Help, List-Subscribe, and List-Unsubscribe. Some style updates to Message.py
* Remove most uses of the types module, in favor of isinstance checks againstbwarsaw2006-04-171-3/+2
| | | | | | the builtin types. Two still remain: a check against ClassType and a check against MethodType. Also, fix some hinky type comparisons to use isinstance() consistently.
* - Convert all logging to Python's standard logging module. Get rid of allbwarsaw2006-04-171-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | traces of our crufty old Syslog. Most of this work was purely mechanical, except for: 1) Initializing the loggers. For this, there's a new module Mailman/loginit.py (yes all modules from now on will use PEP 8 names). We can't call this 'logging.py' because that will interfere with importing the stdlib module of the same name (can you say Python 2.5 and absolute imports?). If you want to write log messages both to the log file and to stderr, pass True to loginit.initialize(). This will turn on propagation of log messages to the parent 'mailman' logger, which is set up to print to stderr. This is how bin/qrunner works when not running as a subprocess of mailmanctl. 2) The driver script. I had to untwist the StampedLogger stuff and implement differently printing exceptions and such to log/error because standard logging objects don't have a write() method. So we write to a cStringIO and then pass that to the logger. 3) SMTPDirect.py because of the configurability of the log messages. This required changing SafeDict into a dict subclass (which is better than using UserDicts anyway -- yay Python 2.3!). It's probably still possible to flummox things up if you change the name of the loggers in the SMTP_LOG_* variables in mm_cfg.py. However, the worst you can do is cause output to go to stderr and not go to a log file. Note too that all entry points into the Mailman system must call Mailman.loginit.initialize() or the log output will go to stderr (which may occasionally be what you want). Currently all CGIs and qrunners should be working properly. I wish I could have tested all code paths that touch the logger, but that's infeasible. I have tested this, but it's possible that there were some mistakes in the translation. - Mailman.Bouncers.BounceAPI.Stop is a singleton, but not a class instance any more. - True/False code cleanup, PEP 8 import restructuring, whitespace normalization, and copyright year updates, as appropriate.
* Now that Python 2.3 is the minimum requirement for Mailman 2.2:bwarsaw2006-04-151-31/+7
| | | | | | | | | | | - Remove True/False binding cruft - Remove __future__ statements for nested scopes - Remove ascii_letters import hack from Utils.py - Remove mimetypes.guess_all_extensions import hack from Scrubber.py - In Pending.py, set _missing to object() (better than using []) Also, update copyright years where appropriate, and re-order imports more to my PEP 8 tastes. Whitespace normalize.
* Preparing for email 3.0/4.0. get_type() -> get_content_type() etc.tkikuchi2006-03-071-2/+2
|
* Fixed bug 1430236 by catching TypeError when trying to get a decoded payloadmsapiro2006-02-191-1/+6
| | | | when payload is None.
* variable name: it is not a floating number. (time tuple)tkikuchi2006-02-031-2/+2
|
* Back out Revision 2.30 patch for email.Message.set_payload() bugtkikuchi2006-01-291-10/+3
| | | | because it is overwrapped in Mailman.Message.
* Port cleaning changes forward from 2.1-maint branch.bwarsaw2005-12-301-5/+6
|
* Fixes for email.set_payload() not distinguish parsed or virgin payload.tkikuchi2005-12-241-21/+24
|
* Add an extra trailing space in scrubbed content URL. This may save thetkikuchi2005-12-131-1/+3
| | | | users of MS Outlook and Apple Mail.
* Add OverflowError in the except list.tkikuchi2005-09-191-1/+1
| | | | | See the thread beginning this post: http://mail.python.org/pipermail/mailman-users/2005-September/046460.html
* back porting from 2.1.6tkikuchi2005-08-281-33/+100
|
* FSF office has moved. chdcking in for MAIN branch.tkikuchi2005-08-271-1/+1
|
* process(): In the msg.is_multipart() clause, inside the clause thatbwarsaw2003-09-131-2/+5
| | | | | | | tries to convert t to something reasonable <wink>, we need to use errors='replace' when we encode from unicode to string. This is because the preceding unicode('ascii', 'replace') could end up inserted U+FFFD, which can't be encoded to ascii.
* makedirs(): Only twiddle the attachment directory permissions if webwarsaw2003-01-241-5/+5
| | | | successfully called makedirs().
* A few minor refinements to the previous patch.bwarsaw2003-01-201-6/+10
| | | | | | | | | | | | guess_all_extensions(): Python 2.1 doesn't have mimetypes.common_types. guess_extension(): all could be empty. process(): Need separate try/except clauses for the conversion to unicode, and the conversion to 8-bit strings. Also, use endswith() instead of t[-1] to be more robust against empty strings.
* Fixes for bug #669081. Based on Tokio Kikuchi's patch, but extendedbwarsaw2003-01-201-18/+56
| | | | | | | | | | | | | | | | to fix the other scrubber bugs, and use better Message API. Specifically, guess_extension(): Use mimetypes.guess_all_extensions() to try to find a match between the claimed extension and the claimed content-type. If they match, then just believe it, otherwise, use the first extension guessed. We can still get weird ones because mimetypes has no notion of a priority of mappings from extension to type. process(): Everywhere we set a part's payload to the "scrubbed" message text, first delete the Content-Type header, allowing set_payload() with a character set to set the header, along with the proper charset parameter.
* save_attachment(): Use Message.get_content_type() instead of thebwarsaw2003-01-191-4/+5
| | | | deprecated .get_type() method.
* Copyright years.bwarsaw2003-01-101-4/+4
|
* safe_strftime(): Watch out for TypeError coming back from strftime().bwarsaw2003-01-101-1/+1
|
* process(): This is the part of Martin's patch # 655214 not related tobwarsaw2002-12-201-0/+2
| | | | | | | the archiver. Martin says: - Fixes a bug in the scrubber, where a content-transfer-encoding might have survived flattening of the message.
* calculate_attachments_dir(): Be defensive about bogus Date headers. ;/bwarsaw2002-11-221-6/+32
|
* Martin v. Loewis's SF patch #634109 for better scrubbing of multipartbwarsaw2002-11-131-22/+52
| | | | | | | | | | | | | | | | | messages. Specifically, process(): Each part that we're scrubbing, we'll set the content type explicity to text/plain (since that's what it is now). We also record the character set of each part and if that's shared by all the subparts, we'll just use that. But, if we don't know the charset of any of the parts then we'll use the list's preferred language's charset. Then we'll make sure all the text/plain parts have the same character set, using 'replace' if necessary. (We may eventually want to utf-8-ify or html-entity-ify them in this case). Patch slightly modified by Barry for i18n and style.
* process(): Fix for SF bug #598844 reported by Patrick Finnerty wherebwarsaw2002-08-221-1/+4
| | | | | | | a particular base64 attachment caused a binascii.Error. This makes sure that before we try to web-safe-ify the html attachment, we decode it first. But we need to update the Content-Transfer-Encoding: header too.
* save_attachment(): Strip leading dots off the filename.bwarsaw2002-08-151-0/+4
|
* Rework the directory layout for attachments. There were two problemsbwarsaw2002-08-131-40/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with the old approach, as pointed out by Michael Meltzer: - it was broken <wink> - it was still susceptible to inode overload The fixes are to put attachments one more directory level down, where the path is archives/private/<listname>/attachments/<YYYYMMDD>/<msgidhash> where YYYYMMDD is the date of the message encoded as 8 digits, and the msgidhash is the first 2 and last 2 octets of the sha hash of the outer message's Message-ID. Any name collisions inside that directory (e.g. a message that contains two images with the same filename), are resolved by finding a unique filebase extension. Specific changes include: calculate_attachments_dir(): Factor out the calculate of the attachments directory so that e.g. Michael can use it in his MimeDel.py hacks. process(): Be sure to pass the appropriate arguments to save_attachment() based on its new signature. makedirs(): Factor out the creation and mod settings of the attachment dir, and all subdirs. save_attachment(): Pass the attachments directory in the arguments. Fix calculation of the url and the saving of the attachment data.
* process(): Calculate the message content type once, falling back tobwarsaw2002-07-121-4/+5
| | | | | the default type if there is no Content-Type: header. Also, recognize that message/rfc822 parts are ismultipart() containers.
* QuoteHyperChars() -> websafe()bwarsaw2002-05-221-3/+2
| | | | | Also, use Utils.websafe() consistently throughout, instead of the inconsistent calls to cgi.escape().
* save_attachment(): Ugly hack to make sure that the attachmentsbwarsaw2002-04-191-1/+4
| | | | | | | | | | directory has the proper mode bits under FreeBSD, which seems to ignore the setgid bit on the os.mkdir() call. I really don't want to have to change every os.mkdir() call site, so I won't make this habit. Closes SF bug #526519
* Update copyright years.bwarsaw2002-03-161-1/+1
|
* save_attachment(): Why not actually /use/ the baseurl local variable?bwarsaw2001-12-191-1/+1
| | | | It's sanitize to make sure it ends in a slash.
* process(): Teach the scrubber about message/rfc822 types, which arebwarsaw2001-11-301-3/+41
| | | | | | | | | | | | | | | | | | | | not multipart, but don't contain a string payload. They have a single Message instance payload so they need to be scrubbed. I don't know whether what we do is the best thing, but we strip out the contained message, store it as an attachment (with very little processing), and include a link/info block of text with the subject, sender, date, size, and url. save_attachment(): When calculating an extension for an unknown type, default to .txt for message/rfc822 types and .bin for everything else. Also, if we're saving a message/rfc822 type as an attachment, all we do is take the raw text of the message, cgi.escape() it for safety, and store it in the attachment file. We could probably do better. Finally, adjust the baseurl for private archives so we don't get a double slash.
* process(): Implement ARCHIVE_HTML_SANITIZER == 3, strip but don'tbwarsaw2001-10-271-3/+16
| | | | escape text/html.
* process(): Implement ARCHIVE_HTML_SANITIZER == 2, meaning "leave itbwarsaw2001-10-271-6/+31
| | | | | | | | | | | inline but HTML-escape it. Also, expand on the == 1 value (HTML-escape an attachment) a bit so the output looks a little nicer. Pipermail actually does a better job here, but we can't use it. save_attachment(): Grows a filter_html option which says whether to filter text/html parts or not. Default is 1, but if ARCHIVE_HTML_SANITIZER == 2 above, we don't want to filter it through the program.