| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
non-ascii charset.
HyperArch:
call write_index_entry from write_threadindex_entry
more style issues; refactoring of several methods
add ctype, charset, and decode attrs to Article; set based on
Content-Type header and an encoded-words in header
add extra blank lines to html templates so that they interact better
with iso-2022-jp text
handle msg <title> and <h1> tags separately to avoid non-ascii
characters in title
add encoding format to index and message headers
add explicit template for index entry
pipermail:
handle messages with timezone gracefully
catch EOFError when loading archive pickle
fix a few bugs that were hidden by bogus overrides in HyperArch
|
| |
|
|
|
|
|
|
| |
factor out "write article as html" as write_article method
fix a few bugs in previous checkin
HyperArch:
get rid of bogus add_article override; over_ride write_article instead
|
| | |
|
| |
|
|
|
|
|
|
| |
but it also had some methods that were overridden cut-n-paste
style.
pipermail: move Article date handling into separate method (should
probably do this for the rest of __init__)
|
| |
|
|
|
|
|
|
|
| |
fix some of the hardest-to-read style problems:
- no space around = in assignment
- obscure comparisons (e.g. use if seq: instead of if len(seq):)
- put body of if/else try/except on separate lines
refactoring: separate some large and/or obscure chunks of code into methods
|
| |
|
|
| |
mlist.archive to be set from mm_cfg.py.
|
| |
|
|
|
| |
in Pipermail archives. Wrap this in a configuration variable
ARCHIVER_OBSCURES_EMAILADDRS because it breaks mailto: urls too.
|
| |
|
|
|
|
| |
it'll only affect what gets put in the .mbox file. We also want
clobber_date to affect what goes in Pipermail or the external
archiver. Move the clobber_date logic to Handlers/ToArchive.py.
|
| |
|
|
|
| |
Also, remove the link to Pipermail on Starship. This is way out of
date.
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
messages being delivered by qrunner it is quite easy to simply run out
of process resources. Now each message to be archived is done in the
parent process, with a bit of extra paranoia in case of archiver
errors.
This also gets rid of the individual archiver locks since it is
required that the list itself be locked in order to get here.
|
| |
|
|
|
|
|
| |
self.database.getArticle() calls in a try/except KeyError. These are
stopgap measures to avoid exceptions percolating upwards when an
archive is regenerated with bin/arch. But it doesn't really fix
anything substantial (can you say "pipermail is broken"? ;)
|
| |
|
|
|
|
| |
likely", initialize the date variable to a float time seconds, not a
time tuple. This is because the following time.ctime() call expects a
float.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dictionary, which gets filled in with the child pid. This allows it
to be reaped by functions higher up in the call chain.
Second, and more importantly, be quite paranoid about the fork and
child processes. Make absolutely sure that the child process exits.
Wrap all the child code in try blocks so that no matter what, the lock
will get released (by calling unlock() with the unconditionally flag
set). If no exception occurs, the child will exit with status 0,
otherwise the traceback will be printed to stderr, a message
indicating which list has a corrupt archive is logged to logs/error,
and the child process will exit with status 1.
Extend the archive.lock to 1 hour.
|
| |
|
|
|
|
|
|
|
|
|
| |
to be more robust so that if either fail, we end up with an empty
self.dict and self.sorted.
Note that the archiver subprocess will still fail with an exception.
Fixing this will require much more work on the archiver as a whole,
and isn't worth it right now. But this fix averts the problem when
regenerating the archive from scratch using bin/arch, so at least
corrupt archives can be rebuilt.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
| |
There was a short discussion a month or so ago about the hyperarch
'mbox' archives having the wrong kind of date in the 'From '
lines... 'unixfrom' lines should have a very specific dateformat,
namely that which 'time.ctime' returns. The following patch fixes
Archiver/HyperArch.py.
One minor change. To be consistent with gate_news, there are two
spaces between the email address and the fromdate.
|
| |
|
|
| |
server.
|
| |
|
|
|
|
|
| |
in a try/finally so we guarantee we de-munge the header.
ArchiveMail(): Undo the previous change (test for mlist.archive).
This is redundant given the test in ToArchive.py.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Now uses time.localtime() instead of time.gmtime().
* The former strategy for generating names of weekly archives was
dependent on time.strftime() doing the right thing if the day of
the month was negative. At least Solaris strftime() did strange
things (like having '%d' generate a string containing '/') when
faced with this problem, so the strategy has now been changed.
Besides, I have changed "self.maillist._internal_name" to
"self.maillist.internal_name()" for purely religious reasons
(non-related objects shouldn't access each other's "private"
attributes).
|
| |
|
|
|
| |
method, use their mapping type interface instead.
ArchiveMail(): Don't do any archiving unless self.archive is true.
|
| |
|
|
| |
in the 100-1899 range. Coerce these to current GMT time.
|
| |
|
|
|
|
|
|
| |
Archiver.__archive_to_mbox(): Since Python 1.5.2 is now required, we
can get rid of the Utils.reraise() hack.
Also, use import statements which are more consistent with the rest of
Mailman.
|
| | |
|
| |
|
|
| |
for this before indexing msg.body[0]
|
| |
|
|
|
|
| |
use the lower cased list name (i.e. the internal name) as the
%(listname)s expansion; log a message when the external archivers
exits with a non-zero status.
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
| |
PUBLIC_EXTERNAL_ARCHIVER is true, the variable contains a shell
command string for os.popen() to invoke the external archiver.
Similar for PRIVATE_EXTERNAL_ARCHIVER and private archiving.
Patch submitted by Paul Hebble <hebble@ncsa.uiuc.edu>, modified by
myself (I couldn't resist!)
|
| |
|
|
|
|
| |
to run (thus breaking the lock). Two changes: crank the lock lifetime
up to 5 minutes, and catch any possible NotLockedErrors that might
occur.
|
| |
|
|
| |
have any effect.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
should considerably help the performance of the archiver.
Specifically:
class DumbBTree: Don't sort the self.sorted list unless some client is
actually traversing the data structure. This saves a lot of work when
items are added. See also Jeremy's XXX comment for further
optimization ideas.
class HyperDatabase: Jeremy also has questions about the usefulness of
the cache used here. Because the items are traversed in linear order,
there isn't much locality of reference, so cache eviction doesn't buy
you much (it's actually more expensive than just keeping everything in
the cache, so that's what we do). That's a space for time trade-off
that might need a re-evaluation.
Clearly, more work could be done to improve the performance of the
archiver, but this should improve matters significantly. Caveat: this
has been only minimally tested in a production environment.
I call this the Hylton Band-aid.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
should considerably help the performance of the archiver.
Specifically:
update_dirty_archives(): Archived articles are appended to the .txt
file, and a gzip'd copy used to be written automatically. However
this turns out to be a huge performance hit (it's not very efficient
to do the entire gzip in Python, and we can't use gzip's append
feature because apparently Netscape doesn't know how to grok gzip
append files). The gzip file only now gets created if 1) gzip can be
imported, and 2) mm_cfg.GZIP_ARCHIVE_TXT_FILES is true.
XXX: note that we should add a cronjob to gzip the file nightly.
consolidate imports
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
should considerably help the performance of the archiver.
Specifically:
ArchiveMail(): Create a lock file (and lock it), just after the fork.
Jeremy observes that there is a race condition when many posts show up
in a short amount of time. By creating a lock file we make sure that
the separate archiver processes won't clobber each other.
Use the new LockFile module.
Move the (c)StringIO import to the top of the file.
|
| |
|
|
|
|
|
|
|
|
| |
This isn't part of the bsddb.btree interface assumed by Pipermail, but
it's only used in one place and /dramatically/ improves Mailman's
performance.
HyperDatabase.clearIndex(): Use DumbBTree.clear(). These changes may
not fix all the performance problems with Mailman, but certainly nails
the most serious problem I've been experiencing.
|
| |
|
|
|
|
|
| |
single message, we must quote any lines beginning with "From " --
the message will be interpreted by the general
HyperArch.HyperArchive.processUnixMailbox() as a (possibly
multi-message) mbox file on it's way into the archive.
|
| |
|
|
| |
archives.
|
| |
|
|
| |
script.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
propagate version string changes to the public (you have to
autoreconf, then reconfigure and reinstall).
Now, VERSION is set directory in Defaults.py.in and the Release.py
script updates that file directly. Now we just need to run
./config.status and do a re-install. I hope this will make things
easier.
I'm also bumping the version to 1.0b8, so I can do a release tomorrow.
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
line occurred at the beginning of an article - because the
space-preserving format style would try to prepend "<pre>" to the None
object that's used as a place holder. Instead of doing string
concatenation, i'm doing the simpler [].insert(0...) and [].append()
to the list of lines. (It'd probably also be a good idea to fix the
code to not use the None place holders, but it's hard to tell what
depends on what there.)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
First of all, i'm setting the default message archive style to
preserve horizontal whitespace, use <pre>, instead of putting a <br>
at the beginning of every line.
More deeply, i'm inhibiting all completely-uppercase variables from
being included in the pickled settings for the list. As with the
VERBOSE problem i addressed in my last checkin, changes to one of the
class default settings at any time *after* the archive object state
was resurrected would be reinstated thence forward - overriding the
defaults, and any other settings. This is not a good interface for
setting options - and i'm not really offering an alternative, other
than making the defaults come back the next time the list is
reinstated, rather than preserving the changes. We need to resolve
this - i suppose with an interface that distinguishes permanent from
temporary settings.
In general i've changed the defaults so they preserve more of the
structure of the postings (and included very brief comments presenting
what i could glean of the settings from the code - this code here does
not seem to have been written to be understood).
|
| |
|
|
|
|
|
| |
was being resurrected from the initial setting for the maillist - for
sites that upgraded using the arch script, set to 1. Which meant for
those sites that the error log would get a new entry for every item
being archived...
|
| |
|
|
|
|
|
| |
at the top - i'm assuming that the attention decreases the older an
archive is, more or less.
Wrapped lots of long lines.
|
| | |
|
| |
|
|
| |
scott
|
| |
|
|
|
| |
added trailing '/' to public archive url (it always points to a directory).
scott
|