On Wed, Mar 01, 2006 at 09:33:30AM -0800, Jonathan Blake wrote: > On 3/1/06, Javier Fernández-Sanguino Peña <jfs@xxxxxxxxxxxx> wrote: > > > Sure. From what I see in the scripts the PDF conversion is done through > > LaTeX > > using the gbtolatex.pl script. The README file claims it uses pdflatex but, > > as far as I can see, it calls Apache's XSLT processor (which uses Java, > > which > > I don't like much). Am I right? > > The README should really say something about how the gbtolatex.pl > script produces a LaTeX file. We then manually run pdflatex on that > file. The labor intensive part is that we adjust the pagination by > hand. This has meant an hour or two tweaking the LaTeX file before it > produces a high-quality PDF. Well, there are some things that are not commented in the README file, like the fact that you have to replace the '%xhtml.characters;' to ' %latex.characters;' in the DOCTYPE definition so you get proper (indented) characters. Other caveats I'm running against is: - the ifthenelse code for the single-page / double-page does not seem to work, I had to scap it - the souvenir.sty used includes fonts I have not been able to compile for my system (I just removed the \usepackage{} call to proceed) After doing that I've been able to compile the TeX file (I'll forward you a copy off list). Obviously, images are not included, how I could go around and add them too? > We haven't really mucked around with the Java code for Xalan. We just > use its command line interface. Ok. I will use the C++ xalan code then. It seems to work for me. > > What are the (known) issues with PDF conversion? Does the XSL file for LaTeX > > needs to be reviewed, is the output ok? > > The XSL file does indeed need to be reviewed. Since we haven't been > publishing the PDFs, it has gotten behind the current revision of > xhtml.xsl. I'll see what I can do, can you point me (with a diff?) to the changes to the XHTML XSL which have not been added to the LaTeX XSL? Is it revision 1.3 vs. 1.2 > I had a question: how well does LaTeX handle character encoding other > than plain ASCII? I ask because 01hdlo.xml makes extensive use of > accented characters. Will we need to prefilter the XML files before > feeding them to pdfLaTeX? You mean 01hh.xml (Freeway Warrior?) It really depends, when writting LaTeX files for non-ASCII encodings you typically use a package for your language so you don't need to change accented characters (á or á) into their LaTeX equivalents (\'a). I guess we will need to generate conversions for all of them to their LaTeX equivalente since, if there are far two many, we might not be able to use a single package to cover them all (maybe babel would do, would need to check). Regards Javier