Must contain an entry for each XSL file justifying the need for this. The more detail the better.
Documentation should include recommendations for how each XSL file could ultimately be removed by one of:
- improved biorXiv XML
- better handling of XML in Encoda
- better support for Encoda output in EPP client
Please provide links to github issues which express these requirements.
In the entry for each XSL file please link to the file in github.
This is a candidate for removal. If titles are provided as all caps in the original we should not be changing it - there are too many exceptions (words that should remain as upper case) to appropriately handle this.
This stylesheet is transforming an XML document by adding a "label" element to any "title" element that has a preceding "label" element, and removing any "label" element that has a following "title" element. Work is required in encoda to approprately decode these labels, so that they can then be represented in the resultant JSON.
This xsl converts <ext-link>
elements within references (<ref>
) that have a uri (in the xlink:href
attribute) which is a DOI string, to use <pub-id pub-id-type="doi">
tagging instead.
<pub-id pub-id-type="doi">
is preferred capture, and this should be fed back to bioRxiv. In addition, encoda needs to be able to appropriately decode <ext-link>
elements in references, so that these can be represented in the JSON, because it is otherwise perfectly acceptable capture in JATS (for example, when the link is not a DOI).
This stylesheet is transforming an XML document by removing any "sec" element with a "sec-type" attribute value of "supplementary-material", and copying the remaining elements into a new "body" element in the output document.
TODO: We need an expression of whether we believe this is work for biorXiv, encoda or EPP team.
Adjusts 2 of the affiliations where the department is being treated as an address rather than in the institution. This is an EPP client issue as we can not get at these values another way.
There may be another example in 10.1101/2022.10.21.513138:
<aff id="a1"><label>1</label><institution>Univ-Bordeaux, Centre de Recherche Cardio-thoracique de Bordeaux</institution>, U1045, Département de Pharmacologie, CIC1401, Pessac, <country>France</country></aff>
It can be fixed by treating affiliations as mixed content (pulling in the text content of aff as well as institution
, country
etc.), and the change has been implemented in encoda v0.121.1 - see https://github.com/elifesciences/enhanced-preprints-issues/issues/343.
This xsl is adding a missing affiliation for the first author. Affiliations a linked to using an <xref>
element, which is a child of the author's <contrib contrib-type="author">
element. This was presumably a typesetting error that could be (or have been) fixed on bioRxiv's end, but we haven't established how best to feedback this kind of problem. This is one of the 'examples' we launched with back in October, and has now been published as an (old style) VOR, so I'm not sure how we want to specifically handle it.
Prerequisites:
- docker
- libxml2-utils (if running on host)
To apply an xslt transform to all biorXiv XML place it in the ./src
folder.
To apply an xslt transform to a specific manuscript place it in the ./src/[DOI-SUFFIX]
folder.
To apply an xslt transform to a specific version of a manuscript place it in the ./src/[DOI-SUFFIX]
and express in the xslt the query to only apply the changes to that version number.
Each xsl file must have at least one accompanying test. It is recommended that the ./test/fixtures/kitchen-sink.xml
express at least one example which could be successfully targetted by the global xsl files in ./src/*.xsl
.
You must drop an XML file in test folder for each XSL file in a folder name that corresponds to the filename of the xsl.
For example, an expected result of the ./src/change-label-and-title-elements.xsl
transform can be found in ./test/change-label-and-title-elements
. The filenames of the expected XML are the same as in the ./test/fixtures
folder.
Some examples:
-
./test/change-label-and-title-elements/kitchen-sink.xml
contains the expected XML of./test/fixtures/kitchen-sink.xml
that has gone through the./src/change-label-and-title-elements.xsl
transform. -
./test/all/kitchen-sink.xml
contains the expected XML of./test/fixtures/kitchen-sink.xml
that has gone through all of the transforms directly in the./src
folder. -
./test/2022.05.30.22275761/remove-supplementary-materials/2022.05.30.22275761.xml
contains the expected XML of./test/fixtures/2022.05.30.22275761/2022.05.30.22275761.xml
that has gone through the./src/2022.05.30.22275761/remove-supplementary-materials.xsl
transform.
docker buildx build -t epp-biorxiv-xslt .
cat test/fixtures/2022.05.30.22275761/2022.05.30.22275761.xml | docker run --rm -i epp-biorxiv-xslt /app/scripts/transform.sh --doi 2022.05.30.22275761
Output to a file:
cat test/fixtures/2022.05.30.22275761/2022.05.30.22275761.xml | docker run --rm -i epp-biorxiv-xslt /app/scripts/transform.sh --doi 2022.05.30.22275761 > output.xml
Introduce logging:
touch session.log
cat test/fixtures/2022.05.30.22275761/2022.05.30.22275761.xml | docker run --rm -i -v "./session.log:/session.log" epp-biorxiv-xslt /app/scripts/transform.sh --doi 2022.05.30.22275761 --log /session.log
Apply only a single xslt:
cat test/fixtures/kitchen-sink.xml | docker run --rm -i epp-biorxiv-xslt /app/scripts/transform.sh /app/src/change-label-and-title-elements.xsl
The structure of the xml within the source folder will be preserved in the destination folder.
./scripts/process-folder.sh /path/to/SOURCE_DIR /path/to/DEST_DIR
Run with logs:
./scripts/process-folder.sh /path/to/SOURCE_DIR /path/to/DEST_DIR --log ./process-folder.log
./project_tests.sh
Run with logs:
./project_tests.sh --log ./project-tests.log
docker buildx build -t epp-biorxiv-xslt .
docker buildx build -t epp-biorxiv-xslt-test -f Dockerfile.test .
docker run --rm epp-biorxiv-xslt-test /app/project_tests.sh