Login | Register
My pages Projects Community openCollabNet

summarydesk
Project home

If you were registered and logged in, you could join this project.

(Locked )

This project is currently locked. Access to the project's tools is read-only. Until the project is unlocked, no project files or data can be modified.

Summary Mailing list summarization system.
Category process
License CollabNet/Tigris.org Apache-style license
Owner(s) kfogel, madanus

What is SummaryDesk?

SummaryDesk is a Web-based interface for writing mailing list summaries. It takes care of all the bookeeping, and lets humans concentrate on the non-automatable part: actually writing the summaries.

SummaryDesk development is sponsored by: CollabNet

What are "summaries", and why do we need a special system to write them?

A summary is a condensed version of all important traffic that happens on a project mailing list. Summaries themselves cannot be automated: a human has to read the emails, decide what's going on, and write a shorter version for an audience that doesn't have time to follow the details. However, many of the most time-consuming aspects of producing summaries can be automated. Some examples:

  1. The summarizer should be able to include URLs to specific messages or threads with a single click, instead of cutting-and-pasting manually.

  2. The system should take care of publishing the summaries automatically. The human summarizer should only be required to write the summary texts, and flag them as ready for publication or not. The system should do the rest.

  3. The system should make it easy for multiple humans to collaborate on summarizing a busy list, by managing in-progress summaries centrally, in a way that is visible to all the summarizers.

  4. Quantitative / statistical data about the mailing list (such as who posted the most, what topics were most popular, etc) can be tracked entirely by the system; humans should not have to spend time on that.

In other words, summarizers can benefit from good tools just like anyone who faces a complex, repeated task.

Unfortunately, there don't seem to be any really good tools out there. Most summaries today use a system called ktpub, named after the mailing list summary for which it was invented: the Linux Kernel Traffic series produced by Zack Brown. Using ktpub is much better than trying to do summaries entirely by hand (in particular, ktpub does the statistical analyses mentioned above), but it still leaves vast room for improvement. The summarizer must do many tasks manually which could be automated. In ktpub, the summarizer produces a master XML file containing the week's summary, and then runs tools to convert that to HTML, text, or whatever consumable format is desired. The process of producing the master XML, however, is highly idiosyncratic: it involves lots of dedicated hacks and editor tricks to save time writing the XML (e.g., special scripts to grab URLs, etc). The problem is that these tricks are local to Zack Brown, or whoever the summarizer is. If he has to hand off editorship to someone else, or get assistance, the new people will have to come up with their own tricks — even though everyone is dealing with the exact same set of problems! (See our conversation with Zack Brown about this; it turns out that he'd been wanting a system like SummaryDesk all along.)

SummaryDesk is intended to solve the summarization problem completely. We mean it to be the next-generation ktpub: a centralized, web-based, highly automated system for producing summaries. It will incorporate every identifiable efficiency that we can think of a way to implement, so that all users benefit from the best practices available.

Overview

You configure SummaryDesk to watch a set of mailing lists. For each mailing list, it keeps track of each thread that takes place on the list, and associates with each thread a summary, that starts out empty of course. From time to time, a human visits the SummaryDesk main page, and selects a list and thread(s) to work on. SummaryDesk presents the selected threads in a conveniently browseable form, and by each thread is a text box, where the summarizer can enter that thread's summary. As she updates the summary, she can save her work-in-progress at any time. At some point, she marks the thread's summary as "publishable", meaning that it will be included in the next scheduled auto-publication of the summary newsletter. Marking a summary as publishable doesn't mean she has to stop working on that summary, it just means that whenever the newsletter goes out, the current state of the summary will be used. The summarizer can also write a "header" and "footer" summary for the list for that week, to give an overview of what list activity has been like. Like the individual thread summaries, these overviews are not published until marked as publishable.

SummaryDesk stores all its data in a database. It is a self-updating system: that is, no manual update process is required when SummaryDesk comes back online after being offline for a while. SummaryDesk just looks at the mailing list archives and brings itself up-to-date whenever it is invoked. (Well, actually, it doesn't look directly at the archives, it looks at the ThreadFind reflection of the archives; see Dependencies for more on that).

To-Do List

As you may have guessd by now, SummaryDesk is not a production-ready system yet. Remaining work, in no particular order:

  • Text and xml formats for the Publication system, and the toggle of the publishable fields in the database.
  • Beautify the html pages.
  • List messages in thread order
  • More efficient way of displaying threads in the summary-status page be devised? (This problem becomes apparent as one starts doing summaries)
  • More keybinding (similar to emacs?) so that user does not need to navigate between the message-list and the summary-editor pages using the mouse.

Dependencies

SummaryDesk uses ThreadFind to actually gather the messages. ThreadFind is an independent system whose purposes are beyond the scope of this document. However, having SummaryDesk watch a mailing list requires also having ThreadFind watch that list; this is easy to configure and will be covered in the documentation, which we're still writing.

How to get it working, from scratch.

  1. Create the database user.

    Make sure the mysql users summarydeskrw' and 'summarydeskro' exist, that the first has read/write access to an existing database named summarydesk, and that the second has read-only access:

      $ mysql -u root -p
      Password: *******
      mysql> grant all on summarydesk.* to summarydeskrw@localhost 
               identified by 'SECRET';
      mysql> grant select on summarydesk.* to summarydeskro@localhost 
               identified by 'SECRET';
      mysql> ^D
      $ 
    
  2. Create the database.

      $ echo "create database summarydesk;" \
             | mysql -u summarydeskrw --password=SECRET
      $ cat init-summarydesk.sql \
             | mysql -u summarydeskrw --password=SECRET summarydesk
    
  3. Configure an instance of ThreadFind (http://threadfind.tigris.org/).

  4. Configure your Web server for SummaryDesk:

      Alias /summarydesk /path/to/summarydesk/folder/ending/with/summarydesk
      <Directory /path/to/summarydesk/folder/ending/with/summarydesk>
           Options Indexes +ExecCGI
           <FilesMatch "^summar">
               SetHandler cgi-script
           </FilesMatch>
           <FilesMatch "^message-list$">
               SetHandler cgi-script
           </FilesMatch>
           <FilesMatch "^publish$">
               SetHandler cgi-script
           </FilesMatch>
           <FilesMatch "^mailing-list-view$">
               SetHandler cgi-script
           </FilesMatch>
      </Directory>
    
  5. Run summarydesk-ctl -c config-file [-d DD-MM-YYYY] start

  6. Start summarizing at http://yourhosthere/summarydesk/ !