summaryrefslogtreecommitdiff
path: root/Mailman/Archiver (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* Numerous changes to support message bodies and headers that use ajhylton2000-09-222-196/+319
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | non-ascii charset. HyperArch: call write_index_entry from write_threadindex_entry more style issues; refactoring of several methods add ctype, charset, and decode attrs to Article; set based on Content-Type header and an encoded-words in header add extra blank lines to html templates so that they interact better with iso-2022-jp text handle msg <title> and <h1> tags separately to avoid non-ascii characters in title add encoding format to index and message headers add explicit template for index entry pipermail: handle messages with timezone gracefully catch EOFError when loading archive pickle fix a few bugs that were hidden by bogus overrides in HyperArch
* pipermail:jhylton2000-09-212-114/+39
| | | | | | | | factor out "write article as html" as write_article method fix a few bugs in previous checkin HyperArch: get rid of bogus add_article override; over_ride write_article instead
* oops! time machine problemjhylton2000-09-211-1/+1
|
* Fix Article class in HyperArch. It is a subclass of pipermail.Article,jhylton2000-09-212-107/+57
| | | | | | | | but it also had some methods that were overridden cut-n-paste style. pipermail: move Article date handling into separate method (should probably do this for the rest of __init__)
* massive reformatting and refactoringjhylton2000-09-212-335/+401
| | | | | | | | | fix some of the hardest-to-read style problems: - no space around = in assignment - obscure comparisons (e.g. use if seq: instead of if len(seq):) - put body of if/else try/except on separate lines refactoring: separate some large and/or obscure chunks of code into methods
* Archiver.InitVars(): Ted Cabeen's SF patch #100554 to allowbwarsaw2000-09-201-1/+1
| | | | mlist.archive to be set from mm_cfg.py.
* Article.__init__(): Chris Snell's patch for obscuring email addressesbwarsaw2000-09-191-0/+2
| | | | | in Pipermail archives. Wrap this in a configuration variable ARCHIVER_OBSCURES_EMAILADDRS because it breaks mailto: urls too.
* __archive_to_mbox(): Don't do the clobber_date logic here, becausebwarsaw2000-09-141-16/+8
| | | | | | it'll only affect what gets put in the .mbox file. We also want clobber_date to affect what goes in Pipermail or the external archiver. Move the clobber_date logic to Handlers/ToArchive.py.
* GetAbsoluteScriptURL() => GetScriptURL(..., absolute=1)bwarsaw2000-08-011-5/+5
| | | | | Also, remove the link to Pipermail on Starship. This is way out of date.
* Remove commented out syslog() lines.bwarsaw2000-07-201-2/+0
|
* ArchiveMail(): Comment out the debugging prints to syslog.bwarsaw2000-07-061-2/+2
|
* __init__(): time.mktime() can generate OverflowError.bwarsaw2000-07-061-1/+1
|
* ArchiveMail(): Don't do this work in a forked subprocesses; with manybwarsaw2000-06-261-53/+42
| | | | | | | | | | messages being delivered by qrunner it is quite easy to simply run out of process resources. Now each message to be archived is done in the parent process, with a bit of extra paranoia in case of archiver errors. This also gets rid of the individual archiver locks since it is required that the list itself be locked in order to get here.
* updateThreadedIndex(), update_archive(): Wrap thebwarsaw2000-06-261-11/+26
| | | | | | | self.database.getArticle() calls in a try/except KeyError. These are stopgap measures to avoid exceptions percolating upwards when an archive is regenerated with bin/arch. But it doesn't really fix anything substantial (can you say "pipermail is broken"? ;)
* __init__(): When the mktime() call fails due to a "bogus year mostbwarsaw2000-06-261-1/+1
| | | | | | likely", initialize the date variable to a float time seconds, not a time tuple. This is because the following time.ctime() call expects a float.
* ArchiveMail(): First, extended the signature to take a msgdatabwarsaw2000-06-231-20/+25
| | | | | | | | | | | | | | | | dictionary, which gets filled in with the child pid. This allows it to be reaped by functions higher up in the call chain. Second, and more importantly, be quite paranoid about the fork and child processes. Make absolutely sure that the child process exits. Wrap all the child code in try blocks so that no matter what, the lock will get released (by calling unlock() with the unconditionally flag set). If no exception occurs, the child will exit with status 0, otherwise the traceback will be printed to stderr, a message indicating which list has a corrupt archive is logged to logs/error, and the child process will exit with status 1. Extend the archive.lock to 1 hour.
* DumbBTree.__init__(): Rewrote the file opening and unmarshaling codebwarsaw2000-06-231-5/+15
| | | | | | | | | | | to be more robust so that if either fail, we end up with an empty self.dict and self.sorted. Note that the archiver subprocess will still fail with an exception. Fixing this will require much more work on the archiver as a whole, and isn't worth it right now. But this fix averts the problem when regenerating the archive from scratch using bin/arch, so at least corrupt archives can be rebuilt.
* Convert all uses of mlist.LogMsg() to the new syslog() interface.bwarsaw2000-06-022-9/+9
|
* Thomas Wouters writes:bwarsaw2000-05-311-1/+3
| | | | | | | | | | | There was a short discussion a month or so ago about the hyperarch 'mbox' archives having the wrong kind of date in the 'From ' lines... 'unixfrom' lines should have a very specific dateformat, namely that which 'time.ctime' returns. The following patch fixes Archiver/HyperArch.py. One minor change. To be consistent with gate_news, there are two spaces between the email address and the fromdate.
* cosmetic changes to force Harald's changes out to the public CVSbwarsaw2000-04-091-2/+9
| | | | server.
* __archive_to_mbox(): Wrap the code after the munge of the Date: headerbwarsaw2000-04-091-13/+15
| | | | | | | in a try/finally so we guarantee we de-munge the header. ArchiveMail(): Undo the previous change (test for mlist.archive). This is redundant given the test in ToArchive.py.
* dateToVolName():hmeland2000-04-091-10/+9
| | | | | | | | | | | | | * Now uses time.localtime() instead of time.gmtime(). * The former strategy for generating names of weekly archives was dependent on time.strftime() doing the right thing if the day of the month was negative. At least Solaris strftime() did strange things (like having '%d' generate a string containing '/') when faced with this problem, so the strategy has now been changed. Besides, I have changed "self.maillist._internal_name" to "self.maillist.internal_name()" for purely religious reasons (non-related objects shouldn't access each other's "private" attributes).
* __archive_to_mbox(): Message objects does not have any "SetHeader"hmeland2000-04-091-3/+3
| | | | | method, use their mapping type interface instead. ArchiveMail(): Don't do any archiving unless self.archive is true.
* __init__(): Catch possible ValueErrors when the date contains a yearbwarsaw2000-04-071-1/+5
| | | | in the 100-1899 range. Coerce these to current GMT time.
* makelink(), breaklink(), Archiver.InitVars(),bwarsaw2000-04-031-17/+11
| | | | | | | | Archiver.__archive_to_mbox(): Since Python 1.5.2 is now required, we can get rid of the Utils.reraise() hack. Also, use import statements which are more consistent with the rest of Mailman.
* Update the copyright lines to include the years 1999 & 2000.bwarsaw2000-03-215-4/+20
|
* ArchiveMail(): the message's body could be the empty string. Checkbwarsaw1999-12-251-1/+1
| | | | for this before indexing msg.body[0]
* ExternalArchive(): Fixes proposed by Bernhard Reiter, specifically:bwarsaw1999-12-111-2/+6
| | | | | | use the lower cased list name (i.e. the internal name) as the %(listname)s expansion; log a message when the external archivers exits with a non-zero status.
* Use convenience StringIO modulebwarsaw1999-11-101-5/+1
|
* cosmeticbwarsaw1999-11-101-1/+0
|
* cosmeticbwarsaw1999-10-301-1/+2
|
* ArchiveMail(): When public archiving is turned on andbwarsaw1999-09-041-7/+22
| | | | | | | | | PUBLIC_EXTERNAL_ARCHIVER is true, the variable contains a shell command string for os.popen() to invoke the external archiver. Similar for PRIVATE_EXTERNAL_ARCHIVER and private archiving. Patch submitted by Paul Hebble <hebble@ncsa.uiuc.edu>, modified by myself (I couldn't resist!)
* ArchiveMail(): it's still possible that the archiver takes a long timebwarsaw1999-08-221-3/+7
| | | | | | to run (thus breaking the lock). Two changes: crank the lock lifetime up to 5 minutes, and catch any possible NotLockedErrors that might occur.
* Make sure we use cPickle if it exists. This change may or may notbwarsaw1999-08-211-3/+8
| | | | have any effect.
* Extensive changes based on Jeremy Hylton's investigations. Thesebwarsaw1999-08-211-41/+49
| | | | | | | | | | | | | | | | | | | | | | | should considerably help the performance of the archiver. Specifically: class DumbBTree: Don't sort the self.sorted list unless some client is actually traversing the data structure. This saves a lot of work when items are added. See also Jeremy's XXX comment for further optimization ideas. class HyperDatabase: Jeremy also has questions about the usefulness of the cache used here. Because the items are traversed in linear order, there isn't much locality of reference, so cache eviction doesn't buy you much (it's actually more expensive than just keeping everything in the cache, so that's what we do). That's a space for time trade-off that might need a re-evaluation. Clearly, more work could be done to improve the performance of the archiver, but this should improve matters significantly. Caveat: this has been only minimally tested in a production environment. I call this the Hylton Band-aid.
* Extensive changes based on Jeremy Hylton's investigations. Thesebwarsaw1999-08-211-15/+14
| | | | | | | | | | | | | | | | | should considerably help the performance of the archiver. Specifically: update_dirty_archives(): Archived articles are appended to the .txt file, and a gzip'd copy used to be written automatically. However this turns out to be a huge performance hit (it's not very efficient to do the entire gzip in Python, and we can't use gzip's append feature because apparently Netscape doesn't know how to grok gzip append files). The gzip file only now gets created if 1) gzip can be imported, and 2) mm_cfg.GZIP_ARCHIVE_TXT_FILES is true. XXX: note that we should add a cronjob to gzip the file nightly. consolidate imports
* Extensive changes based on Jeremy Hylton's investigations. Thesebwarsaw1999-08-211-6/+16
| | | | | | | | | | | | | | should considerably help the performance of the archiver. Specifically: ArchiveMail(): Create a lock file (and lock it), just after the fork. Jeremy observes that there is a race condition when many posts show up in a short amount of time. By creating a lock file we make sure that the separate archiver processes won't clobber each other. Use the new LockFile module. Move the (c)StringIO import to the top of the file.
* DumbBTree.clear(): New method to short-circuit clearing the btree.bwarsaw1999-07-011-3/+8
| | | | | | | | | | This isn't part of the bsddb.btree interface assumed by Pipermail, but it's only used in one place and /dramatically/ improves Mailman's performance. HyperDatabase.clearIndex(): Use DumbBTree.clear(). These changes may not fix all the performance problems with Mailman, but certainly nails the most serious problem I've been experiencing.
* Archiver.ArchiveMail(): As this method always is called with onehmeland1999-07-011-1/+5
| | | | | | | single message, we must quote any lines beginning with "From " -- the message will be interpreted by the general HyperArch.HyperArchive.processUnixMailbox() as a (possibly multi-message) mbox file on it's way into the archive.
* update_dirty_archives(): Set umask to 002 when creating gzipped texthmeland1999-06-041-1/+5
| | | | archives.
* GetBaseArchiveURL(): tack the new CGI extension onto the privatebwarsaw1999-02-281-1/+3
| | | | script.
* VERSION is no longer set in configure because it's too hard tobwarsaw1999-01-151-2/+0
| | | | | | | | | | | | propagate version string changes to the public (you have to autoreconf, then reconfigure and reinstall). Now, VERSION is set directory in Defaults.py.in and the Release.py script updates that file directly. Now we just need to run ./config.status and do a re-install. I hope this will make things easier. I'm also bumping the version to 1.0b8, so I can do a release tomorrow.
* Use the default argument to the reraise() functionbwarsaw1998-12-291-4/+4
|
* .format_article(): Archiving would occasionally fail when an emptyklm1998-11-221-11/+2
| | | | | | | | | | line occurred at the beginning of an article - because the space-preserving format style would try to prepend "<pre>" to the None object that's used as a place holder. Instead of doing string concatenation, i'm doing the simpler [].insert(0...) and [].append() to the list of lines. (It'd probably also be a good idea to fix the code to not use the None place holders, but it's hard to tell what depends on what there.)
* Here's two drastic policy-level changes to the archive mechanism.klm1998-11-221-9/+10
| | | | | | | | | | | | | | | | | | | | | | | First of all, i'm setting the default message archive style to preserve horizontal whitespace, use <pre>, instead of putting a <br> at the beginning of every line. More deeply, i'm inhibiting all completely-uppercase variables from being included in the pickled settings for the list. As with the VERBOSE problem i addressed in my last checkin, changes to one of the class default settings at any time *after* the archive object state was resurrected would be reinstated thence forward - overriding the defaults, and any other settings. This is not a good interface for setting options - and i'm not really offering an alternative, other than making the defaults come back the next time the list is reinstated, rather than preserving the changes. We need to resolve this - i suppose with an interface that distinguishes permanent from temporary settings. In general i've changed the defaults so they preserve more of the structure of the postings (and included very brief comments presenting what i could glean of the settings from the code - this code here does not seem to have been written to be understood).
* .__getstate__(): VERBOSE was not being excluded, so the verbose stateklm1998-11-211-1/+2
| | | | | | | was being resurrected from the initial setting for the maillist - for sites that upgraded using the arch script, set to 1. Which meant for those sites that the error log would get a new entry for every item being archived...
* .sortarchives(): Reversed TOC so that it puts the most recent archivesklm1998-11-211-45/+81
| | | | | | | at the top - i'm assuming that the attention decreases the older an archive is, more or less. Wrapped lots of long lines.
* `make distclean' now removes stray .pyc files.bwarsaw1998-11-171-0/+1
|
* changed old document template's #var archivedate to be %(archivedate)s.cotton1998-11-091-1/+1
| | | | scott
* changed to use %s/%s instead of os.path.join,cotton1998-11-091-6/+2
| | | | | added trailing '/' to public archive url (it always points to a directory). scott