summaryrefslogtreecommitdiff
path: root/Mailman/Archiver/HyperArch.py
Commit message (Collapse)AuthorAgeFilesLines
...
* Bump copyright years.bwarsaw2002-03-051-1/+1
|
* Patches to help localize where file system layout decisions are made.bwarsaw2002-03-051-3/+1
| | | | | | Don't use PUBLIC_ARCHIVE_FILE_DIR and PRIVATE_ARCHIVE_FILE_DIR directly, use the archive_dir() method or the Site module methods instead.
* GetArchLock(), DropArchLock(): Get rid of posixfile locking, which isbwarsaw2001-12-241-16/+7
| | | | deprecated in Python 2.2, in favor of our good ol' standard LockFile.
* write_index_entry(): Slight modification so that the subject andbwarsaw2001-10-281-3/+2
| | | | | author line are always html-escaped. This fixes index summaries for messages with html in their Subject: line for instance.
* Tokio Kikuchi says:bwarsaw2001-10-261-9/+8
| | | | | | | | | | | | Some time ago, someone complained about the pipermail not representing proper charset in the Content-Type header. Here is a patch for the latest CVS (2.1a). With some changes by Barry, specifically to get the charset parameter out of the Content-Type: header using email.Message's interface instead of regexp searching. Please double check this for me!
* De-string-module-ification.bwarsaw2001-10-151-38/+40
| | | | Also, remove pickle import since it doesn't seem to be used anywhere.
* html_TOC(): Use archive_dir() instead of the archive_directorybwarsaw2001-07-261-1/+1
| | | | attribute.
* Add missing import of 'syslog'.twouters2001-07-101-0/+1
|
* Use better syslog() calling convention.bwarsaw2001-06-271-4/+4
|
* Module global article_template removed. Instead, this html is movedbwarsaw2001-06-071-56/+4
| | | | | | into templates/en/article.html and as_html() is modified to use the standard maketext() call to retrieve the template and interpolate in a dictionary of values.
* Remove an unnecessary import.bwarsaw2001-05-181-3/+1
|
* REpat: recognize 'Re[2]:' style reply-subjects. Fixes SF bug #223554.twouters2001-03-031-1/+1
|
* processListArch(), write_TOC(), write_article(), update_archive(),bwarsaw2001-02-151-10/+30
| | | | update_article(): Utils.open_ex() is obsolete.
* __init__(): Fix for case where Content-Transfer-Encoding header isbwarsaw2000-11-131-2/+2
| | | | missing. Submitted by Erik Forsberg (original patch author).
* __init__(): The values of Content-Type and Content-Transfer-Encodingbwarsaw2000-11-101-3/+6
| | | | are case insensitive according to RFC 1521. Closes patch #102268.
* get_archives(): Removed the line which specifically eliminatesbwarsaw2000-11-101-1/+0
| | | | | | | | messages with a Subject: in ('subscribe', 'unsubscribe'). This kind of filtering happens at higher levels in Mailman; if such a message is in the .mbox file, it should be in the html archive. Closes bug #121811 for real now.
* _rx_quote: Change the regexp so that it only matches legitimatebwarsaw2000-11-091-1/+1
| | | | | (uppercase) hex digits. This fixes SF bug #117548 and replaces the suggested patch in patch #102097. (patch approved by Jeremy.)
* HyperArchive.html_TOC(): Dan Mick recognized an indentation bug whichbwarsaw2000-10-271-1/+1
| | | | broke the archiver.
* article_text_template: Get rid of this, we're going to do thingsbwarsaw2000-10-031-15/+18
| | | | | | | | | | | | | differently. class Article(): Get rid of text_tmpl for the same reason. as_text(): We need to retain In-Reply-To:, References:, and Message-ID: if the downloadable periodics are to be at all threadable. Suggested by Gerald Oskoboiny. Also, get something reasonable defaults for Date: header if the original message is missing it (i.e. "None" isn't reasonable :).
* Fixes to the Pipermail TOC page, and to the monthly (or whateverbwarsaw2000-10-021-18/+30
| | | | | | | | | | | | | | | | | | | | | | period) .txt files that ar generated. Specifically, sizeof(): factor out code to calculate size of file with appropriate bytes/KB/MB suffix. article_text_template: For proper parsing by most Unix mail compatible tools, the From: header should be in the form From: emailaddr (real name) Article.as_text(): Make sure the plain text headers have valid (even if bogus) From_ separator for compatibility with Unix mail and similar tools. Craft a fromdate and email address if they aren't present in the original message. TOC_template, html_TOC(): Added a link to the full raw archive file, which was always available, but hidden. You still need to go through private.py if the archives are private, of course. Also, report on the approximate size of the raw archive.
* two changes to charset handling issues: check for charset injhylton2000-10-021-15/+29
| | | | | | | | | | | | | mm_cfg.VERBATIM_ENCODING list and do not call html_quote if it is found. The list should contain charsets that use multibyte encodings where 0x26 may not represented the & character. Add option for default charset (None == us-ascii). Fix bug in format_article that added <pre> tags to the message body *before* writing the text version. The fix isn't very clean, but it is functional and quick. Generate the HTML body and store it as html_body attribute. Use this in preference to body attribute when writing html output.
* Extensive cleanup and performance improvements. Most signficant changes are:jhylton2000-09-221-75/+91
| | | | | | | | | | | | | | | | - add support for decoding subjects in links to next and prev message if the encodings of the two messages are the same - change re.sub('"',...) with string.replace('"', ...) - remove unused __processbody_CGIescape method - vast simplification and speed up of format_article (still more to do in methods it calls) - change logic of loadbody_fromHTML to avoid unnecessary tests - add slightly optimized mailbox class
* apply patch #100867-- add robot meta tags to cause more intelligentjhylton2000-09-221-0/+3
| | | | search engine indexing of index and message pages
* replace null bytes in message body with spacesjhylton2000-09-221-2/+4
|
* Article._get_body(): Python 1.5.2's int() takes only one argument.bwarsaw2000-09-221-1/+1
|
* decode quoted-printable message bodiesjhylton2000-09-221-4/+38
| | | | | | keep _charsets dictionary in pickled rep of archive; this allows the charset for an index page to be set based on the total count of charsets in all messages
* default should not be verbosejhylton2000-09-221-1/+1
|
* Several changes in support of David Champion's SF patch #101331.bwarsaw2000-09-221-2/+1
| | | | | | | Specifically, HyperArchive.html_TOC_entry(): Don't calculate path to archives/private here; it's already done for us in mm_cfg.
* HyperArchive.__init__(): charset attribute needs to be initialized tobwarsaw2000-09-221-2/+3
| | | | | | None otherwise pure-ascii archives fail to build. Other de-Python-2.0-ifications.
* Numerous changes to support message bodies and headers that use ajhylton2000-09-221-185/+302
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | non-ascii charset. HyperArch: call write_index_entry from write_threadindex_entry more style issues; refactoring of several methods add ctype, charset, and decode attrs to Article; set based on Content-Type header and an encoded-words in header add extra blank lines to html templates so that they interact better with iso-2022-jp text handle msg <title> and <h1> tags separately to avoid non-ascii characters in title add encoding format to index and message headers add explicit template for index entry pipermail: handle messages with timezone gracefully catch EOFError when loading archive pickle fix a few bugs that were hidden by bogus overrides in HyperArch
* pipermail:jhylton2000-09-211-103/+18
| | | | | | | | factor out "write article as html" as write_article method fix a few bugs in previous checkin HyperArch: get rid of bogus add_article override; over_ride write_article instead
* oops! time machine problemjhylton2000-09-211-1/+1
|
* Fix Article class in HyperArch. It is a subclass of pipermail.Article,jhylton2000-09-211-91/+33
| | | | | | | | but it also had some methods that were overridden cut-n-paste style. pipermail: move Article date handling into separate method (should probably do this for the rest of __init__)
* massive reformatting and refactoringjhylton2000-09-211-19/+15
| | | | | | | | | fix some of the hardest-to-read style problems: - no space around = in assignment - obscure comparisons (e.g. use if seq: instead of if len(seq):) - put body of if/else try/except on separate lines refactoring: separate some large and/or obscure chunks of code into methods
* Article.__init__(): Chris Snell's patch for obscuring email addressesbwarsaw2000-09-191-0/+2
| | | | | in Pipermail archives. Wrap this in a configuration variable ARCHIVER_OBSCURES_EMAILADDRS because it breaks mailto: urls too.
* GetAbsoluteScriptURL() => GetScriptURL(..., absolute=1)bwarsaw2000-08-011-5/+5
| | | | | Also, remove the link to Pipermail on Starship. This is way out of date.
* __init__(): time.mktime() can generate OverflowError.bwarsaw2000-07-061-1/+1
|
* __init__(): When the mktime() call fails due to a "bogus year mostbwarsaw2000-06-261-1/+1
| | | | | | likely", initialize the date variable to a float time seconds, not a time tuple. This is because the following time.ctime() call expects a float.
* Convert all uses of mlist.LogMsg() to the new syslog() interface.bwarsaw2000-06-021-3/+4
|
* Thomas Wouters writes:bwarsaw2000-05-311-1/+3
| | | | | | | | | | | There was a short discussion a month or so ago about the hyperarch 'mbox' archives having the wrong kind of date in the 'From ' lines... 'unixfrom' lines should have a very specific dateformat, namely that which 'time.ctime' returns. The following patch fixes Archiver/HyperArch.py. One minor change. To be consistent with gate_news, there are two spaces between the email address and the fromdate.
* cosmetic changes to force Harald's changes out to the public CVSbwarsaw2000-04-091-2/+9
| | | | server.
* dateToVolName():hmeland2000-04-091-10/+9
| | | | | | | | | | | | | * Now uses time.localtime() instead of time.gmtime(). * The former strategy for generating names of weekly archives was dependent on time.strftime() doing the right thing if the day of the month was negative. At least Solaris strftime() did strange things (like having '%d' generate a string containing '/') when faced with this problem, so the strategy has now been changed. Besides, I have changed "self.maillist._internal_name" to "self.maillist.internal_name()" for purely religious reasons (non-related objects shouldn't access each other's "private" attributes).
* __init__(): Catch possible ValueErrors when the date contains a yearbwarsaw2000-04-071-1/+5
| | | | in the 100-1899 range. Coerce these to current GMT time.
* Update the copyright lines to include the years 1999 & 2000.bwarsaw2000-03-211-1/+1
|
* Extensive changes based on Jeremy Hylton's investigations. Thesebwarsaw1999-08-211-15/+14
| | | | | | | | | | | | | | | | | should considerably help the performance of the archiver. Specifically: update_dirty_archives(): Archived articles are appended to the .txt file, and a gzip'd copy used to be written automatically. However this turns out to be a huge performance hit (it's not very efficient to do the entire gzip in Python, and we can't use gzip's append feature because apparently Netscape doesn't know how to grok gzip append files). The gzip file only now gets created if 1) gzip can be imported, and 2) mm_cfg.GZIP_ARCHIVE_TXT_FILES is true. XXX: note that we should add a cronjob to gzip the file nightly. consolidate imports
* update_dirty_archives(): Set umask to 002 when creating gzipped texthmeland1999-06-041-1/+5
| | | | archives.
* .format_article(): Archiving would occasionally fail when an emptyklm1998-11-221-11/+2
| | | | | | | | | | line occurred at the beginning of an article - because the space-preserving format style would try to prepend "<pre>" to the None object that's used as a place holder. Instead of doing string concatenation, i'm doing the simpler [].insert(0...) and [].append() to the list of lines. (It'd probably also be a good idea to fix the code to not use the None place holders, but it's hard to tell what depends on what there.)
* Here's two drastic policy-level changes to the archive mechanism.klm1998-11-221-9/+10
| | | | | | | | | | | | | | | | | | | | | | | First of all, i'm setting the default message archive style to preserve horizontal whitespace, use <pre>, instead of putting a <br> at the beginning of every line. More deeply, i'm inhibiting all completely-uppercase variables from being included in the pickled settings for the list. As with the VERBOSE problem i addressed in my last checkin, changes to one of the class default settings at any time *after* the archive object state was resurrected would be reinstated thence forward - overriding the defaults, and any other settings. This is not a good interface for setting options - and i'm not really offering an alternative, other than making the defaults come back the next time the list is reinstated, rather than preserving the changes. We need to resolve this - i suppose with an interface that distinguishes permanent from temporary settings. In general i've changed the defaults so they preserve more of the structure of the postings (and included very brief comments presenting what i could glean of the settings from the code - this code here does not seem to have been written to be understood).
* .__getstate__(): VERBOSE was not being excluded, so the verbose stateklm1998-11-211-1/+2
| | | | | | | was being resurrected from the initial setting for the maillist - for sites that upgraded using the arch script, set to 1. Which meant for those sites that the error log would get a new entry for every item being archived...
* .sortarchives(): Reversed TOC so that it puts the most recent archivesklm1998-11-211-45/+81
| | | | | | | at the top - i'm assuming that the attention decreases the older an archive is, more or less. Wrapped lots of long lines.