| Commit message (Collapse) | Author | Files | Lines |
|
|
|
to run (thus breaking the lock). Two changes: crank the lock lifetime
up to 5 minutes, and catch any possible NotLockedErrors that might
occur.
|
|
|
|
When processing the archive, lay claim to the archiver lock so that
any posts that come in while the archiver is running will get
blocked. This isn't optimal because we don't really know how long the
archiver will run (I give it 3 hours max by default).
HyperArchive.processUnixMailbox() should actually refresh() the lock
while it's working.
Also, the MailList object is locked too, although the lifetime is
shorter. I wonder if this is still necessary?
Use getopt module to process options. Added a docstring which becomes
the usage message.
|
|
|
|
|
|
have any effect.
|
|
should considerably help the performance of the archiver.
Specifically:
class DumbBTree: Don't sort the self.sorted list unless some client is
actually traversing the data structure. This saves a lot of work when
items are added. See also Jeremy's XXX comment for further
optimization ideas.
class HyperDatabase: Jeremy also has questions about the usefulness of
the cache used here. Because the items are traversed in linear order,
there isn't much locality of reference, so cache eviction doesn't buy
you much (it's actually more expensive than just keeping everything in
the cache, so that's what we do). That's a space for time trade-off
that might need a re-evaluation.
Clearly, more work could be done to improve the performance of the
archiver, but this should improve matters significantly. Caveat: this
has been only minimally tested in a production environment.
I call this the Hylton Band-aid.
|
|
should considerably help the performance of the archiver.
Specifically:
update_dirty_archives(): Archived articles are appended to the .txt
file, and a gzip'd copy used to be written automatically. However
this turns out to be a huge performance hit (it's not very efficient
to do the entire gzip in Python, and we can't use gzip's append
feature because apparently Netscape doesn't know how to grok gzip
append files). The gzip file only now gets created if 1) gzip can be
imported, and 2) mm_cfg.GZIP_ARCHIVE_TXT_FILES is true.
XXX: note that we should add a cronjob to gzip the file nightly.
consolidate imports
|
|
should considerably help the performance of the archiver.
Specifically:
ArchiveMail(): Create a lock file (and lock it), just after the fork.
Jeremy observes that there is a race condition when many posts show up
in a short amount of time. By creating a lock file we make sure that
the separate archiver processes won't clobber each other.
Use the new LockFile module.
Move the (c)StringIO import to the top of the file.
|
|
archivers .txt file is gzip'd on the fly. This turns out to be a
major performance hit, so it's disabled by default. This means that
to update the txt.gz file, you will need to run a cronjob.
|
|
module. Specifically:
__del__(): Be sure to unlock the list when it gets GC'd
InitTempVars(): _tmp_lock => __createlock_p
_lock_file => __lock
__lock is now a LockFile object and all locking goes through this
object.
Load(): _tmp_lock => __createlock_p
Locked(), Lock(), Unlock(): Use the new LockFile object as the basis
for locking. Also, Unlock() should catch AlreadyLockedError for
backwards compatable semantics.
|
|
|
|
hung_timeout => lifetime
|
|
LockFile.AlreadyLockedError
|
|
|
|
Most important: a semantic change. When a lock is acquired, an
additional float value is written to the lock file. This is the
lock's expected lifetime, as an instant some point in the future
(i.e. time.time() + lifetime). This allows processes to give a clue
to other claimants as to how long the lock holder intends to keep the
lock. This is necessary because the same resource may need to be
locked by short lived and long lived processes (e.g. the archiver).
Without this, the short lived process has no idea when the lock owner
should have given up the lock and could steal it out from under it.
It is possible that a process could continually refresh a lock (see
below) in an infloop, thus causing all other claimants to back off
forever. This is (I believe) much less likely than that a process
crashes, leaving a lock turd around.
Rename this file to LockFile.py (retain flock.py for CVS
archival purposes, but soon all references to this module will be
changed to use LockFile instead of flock). The main class in this
module is also named LockFile.
All the exceptions have been changed to class-based. LockError is the
base exception class. AlreadyCalledLockError is changed to
AlreadyLockedError but NotLockedError and TimeOutError are not
changed.
New public methods set_lifetime() and refresh(). The former sets the
lifetime interval so that the same lock object can be reused with a
different lock lifetime. The latter refreshes the lock lifetime for
an already held lock. A process can use this if it suddenly realizes
it needs more time to complete its work.
"hung_timeout" is renamed to lifetime, because of the changed semantics
All private attributes have been __renamed
Docstrings everywhere!
|
|
|
|
for an address matching a regular expression. Very handy when you
have a bouncing address subscribed to many lists!
Makefile.in: install find_member
|