[projectaon] Conversion of encoding in spanish XML files and new latex.xsl file

  • From: Javier Fernández-Sanguino Peña <jfs@xxxxxxxxxxxx>
  • To: Project Aon <projectaon@xxxxxxxxxxxxx>
  • Date: Tue, 28 Nov 2006 00:38:09 +0100

Hi all,

I've found that the Spanish XML files have not been encoded properly and use
an assorted list of non-ASCII characters (one file has even a Unicode
encoding). In order to find and fix the culprits (they were not being encoded
properly in the LaTeX files and, consequently, in the PDFs I generated) I've
devised a set of scripts:

- convert-accent.pl : converts accented characters commonly used in Spanish
  to their XSL definitions
- find-nonascii.pl: simple, yet efective way to find non-ASCII characters in
  files.

All the files can be easily converted using the first one (takes a file in
input and outputs it fixed) and checked with the second one. If you do this
for all files they will get fixed (save for 03lcdk which has a Unicode char
at line 1085 -M'lare- which needs to be fixed by manually editing the file)

Oh, and attached is yet another revision of the latex.xsl files with more
unicode characters properly defined.

I've used this to generate PDF files for all four spanish books at the
Project (I've sent these to the coordinator for review), but they could be
useful in the future for other internationalised editions of the books (if
some other group starts transcribing them into XML).

Regards

Javier
#!/usr/bin/perl -p

s/á/\<ch.aacute\/\>/g;
s/é/\<ch.eacute\/\>/g;
s/í/\<ch.iacute\/\>/g;
s/ó/\<ch.oacute\/\>/g;
s/ú/\<ch.uacute\/\>/g;
s/ñ/\<ch.ntilde\/\>/g;
s/Á/\<ch.Aacute\/\>/g;
s/É/\<ch.Eacute\/\>/g;
s/Í/\<ch.Iacute\/\>/g;
s/Ó/\<ch.Oacute\/\>/g;
s/Ú/\<ch.Uacute\/\>/g;
s/ä/\<ch.auml\/\>/g;
s/ë/\<ch.euml\/\>/g;
s/ï/\<ch.iuml\/\>/g;
s/ö/\<ch.ouml\/\>/g;
s/ü/\<ch.uuml\/\>/g;
s/Ñ/\<ch.Ntilde\/\>/g;
s/´/\<ch.acute\/\>/g;
s/¡/\<ch.iexcl\/\>/g;
s/¿/\<ch.iquest\/\>/g;
s/«/\<ch.laquo\/\>/g;
s/»/\<ch.raquo\/\>/g;
#s/\&/\<ch.ampersand\/\>/g;

#!/usr/bin/perl -nw
print if /[^[:ascii:]]/

Other related posts:

  • » [projectaon] Conversion of encoding in spanish XML files and new latex.xsl file