[projectaon] Re: First post! (presentation)

  • From: "Jonathan Blake" <blake.jon@xxxxxxxxx>
  • To: projectaon@xxxxxxxxxxxxx
  • Date: Wed, 1 Mar 2006 09:33:30 -0800

On 3/1/06, Javier Fernández-Sanguino Peña <jfs@xxxxxxxxxxxx> wrote:

> Sure. From what I see in the scripts the PDF conversion is done through LaTeX
> using the gbtolatex.pl script. The README file claims it uses pdflatex but,
> as far as I can see, it calls Apache's XSLT processor (which uses Java, which
> I don't like much). Am I right?

The README should really say something about how the gbtolatex.pl
script produces a LaTeX file. We then manually run pdflatex on that
file. The labor intensive part is that we adjust the pagination by
hand. This has meant an hour or two tweaking the LaTeX file before it
produces a high-quality PDF.

We haven't really mucked around with the Java code for Xalan. We just
use its command line interface.

> What are the (known) issues with PDF conversion? Does the XSL file for LaTeX
> needs to be reviewed, is the output ok?

The XSL file does indeed need to be reviewed. Since we haven't been
publishing the PDFs, it has gotten behind the current revision of
xhtml.xsl.

I had a question: how well does LaTeX handle character encoding other
than plain ASCII? I ask because 01hdlo.xml makes extensive use of
accented characters. Will we need to prefilter the XML files before
feeding them to pdfLaTeX?

> One thing that seems to be missing is a wonderful 'Makefile' that will
> automate all the process (check, generate, publish...)

There has been some development on the automated publishing front.
Right now, a cron script will automatically publish a book (and do
some minimal checks). This is still in the testing phase.

> It also looks like the "logic" structure of the book is not linted (with
> glint) when reviewing the XML. I added checks in my toolkit to determine when
> sections were not referenced by other sections [1] some other logic checks
> could be implemented (which would require the toolkit to be able to handle
> "conditional" maps).

Some of that is handled by validation checks using RXP, the xml
validator, but if you can improve or add to our checking mechanisms,
that would be great. Our checks are somewhat disorganized right now.
Several of us do our own checks on the files, but there isn't a
general, written procedure that we follow.

> I understand that the XML files for PA are need to use all these tools,
> googling I've found http://www.projectaon.org/sanctum/xml/ although it's not
> linked from the PA main site. Are this the latest available files? If not,
> could someone forward me a tar.gz with them?

I will forward them to you offlist.

> No problem. I will try to contribute stuff when I tinker with it. I would
> really love to see good quality PDFs happen. That might make eBay prices for
> these books go down a bit (I'm surprised to see people paying more than 50$
> for a used, 20-year old, book!)

I'm glad I completed most of my collection before I got married. I
don't think I could convince my wife to let me spend that much! ;) I
actually completed my collection by buying two mostly complete
collections, merging them, and selling off the duplicates. That was
expensive!

--
Jon

Other related posts: