-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check pofile string delimiters #1151
base: master
Are you sure you want to change the base?
Conversation
En `library/re.po` había una entrada que no estaba delineada correctamente con comillas dobles (si ven el diff entero es la última entrada en el diff, o pueden ver simplemente el primer commit de este PR). Esto hacía que `powrap --check` se saltara el archivo y no lo validara. Esto, a su vez, ocurría porque la utilidad `msgcat` de `gettext` identificaba el error de sintaxis, y fallaba al ser ejecutada. `powrap` no consideraba esos errores al momento de calcular el exit code del proceso, y por lo tanto el archivo no sólo seguía siendo inválido, sino que tampoco era verificado. De igual forma, el archivo no podía ser wrapeado correctamente usando `powrap library/re.po`. Ya abrí un PR contra `powrap` para cambiar este comportamiento en https://git.afpy.org/AFPy/powrap/pulls/4 (actualización: el PR ya fue mergeado, y una nuevs versión de powrap fue publicada, pornlo que también actualicé en este PR nuestra dependencia de powrap, además del pre-commit hook de powrap). Por otro lado, el resto de nuestras herramientas *no* consideraban este archivo como inválido, Esto es porque `polib` no hacía la validación correspondiente, e incorrectamente parseaba la entrada. También abrí un PR contra polib para esto en izimobil/polib#161. Actualización: en el intertanto también me di cuenta de que el paquete `babel` sufre del mismo problema, yo incorrectamente había asumido que babel dependía de polib; PR creada contra babel: python-babel/babel#1151. Después de corregir el error de sintaxis, ejecuté powrap de tal manera que ahora `library/re.po` está bien formateado. --------- Signed-off-by: Rodrigo Tobar <[email protected]>
Gentle ping, at least to kick off CI and check if there's any obvious mistakes to be fixed |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1151 +/- ##
==========================================
- Coverage 91.27% 91.26% -0.02%
==========================================
Files 27 27
Lines 4619 4634 +15
==========================================
+ Hits 4216 4229 +13
- Misses 403 405 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Sorry for the delay and thanks for the contribution! ❤️
Some comments within – very nice work on the very detailed commit messages; they were a joy to read!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @akx for the review, it's much appreciated.
I'm far from computers.at the moment, so I won't be pushing anything until next week, at least. In the meantime please see my replies. I'll also attend to the coverage report that was raised, I thought I had covered all bases with the tests.
Do you prefer force-pushes (I know I do) or further commits?
While the re module caches some of the latest compilations, it's better form to not rely on it doing so. Signed-off-by: Rodrigo Tobar <[email protected]>
The exact same check is performed a few lines above. Signed-off-by: Rodrigo Tobar <[email protected]>
Since Python 2.3 sorted() has been guaranteed to be stable. The comment was wrong, and thus it makes sense to do the full assertion as clearly intended. Signed-off-by: Rodrigo Tobar <[email protected]>
The _NormalizeString helper class mixes some responsibilities, not only acting as a container for potentially multiple lines of a single string message, but also doing and hiding some of the parsing of such strings. "Doing" because it performs a .strip() on all incoming strings in order to remove any whitespace before/after them, and "hiding" because when invoking the "denormalize" method, each line is slices to remove the first and last element, which are implicitly assumed to be the string delimiters (double quotes, in principle). These multiple roles have already led to confusion within the codebase as to how this class is supposed to be used. Its existing unit test doesn't provide strings with proper delimiters (and thus calling .denormalize() on these objects would return unexpected results -- empty strings in all cases). Similarly, missing msgstr instances also result in a call to _NormalizeString(""), which does work, but is conceptually incorrect, as the empty string is somethiing that _NormalizeString should never see coming in. This commit changes all the places where confusing usage of the _NormalizeString class happens. In particular, the existing unit test's strings are now always delimited by double quotes (so calling .denormalize on them would yield the expected value). A number of new unit tests have also been added exercising the denormalize() method, which includes unescaping escaped characters. Finally, the construction of an empty string message has been simplified to _NormalizeString(). Signed-off-by: Rodrigo Tobar <[email protected]>
Strings should be delimited on both ends by double quotes, but this is currently not being been detected, and content is simply being incorrectly trimmed. This commit adds a check for each string to verify it starts and ends with a double quote character, issuing a warning/error if that's not the case (and fixing it as appropriate). A few new test cases have been added to check that the lack of double quotes to delimit strings issues errors as expected. Signed-off-by: Rodrigo Tobar <[email protected]>
Now that all strings given as inputs to _NormalizeString have been verified (or corrected) to be correctly delimited with double quotes, there's no reason to continue doing an internal strip anymore. Moreover, we can express this internal constraint with an assertion to avoid issues in the future. Signed-off-by: Rodrigo Tobar <[email protected]>
1a9a142
to
ff5213b
Compare
@akx thanks for the patience .I finally found some time today to work on this. I opted for adding yet more checks (and tests) for the msgstr plural entries, hence ensuring their messages are as well-formed as the rest. I re-based on top of the latest |
This PR adds checks to the pofile parser code to validate that message strings are correctly delimited by double quotes. Keeping with the current design, an error is only raised if requested, otherwise a warning is printed, the faulty lines are corrected and parsing goes on.
I found this issue while processing a pofile used in the Spanish translation of the CPython documentation. One of our files was incorrectly written, and from all our tooling only the
msgcat
tool of GNU'sgettext
package complained, whilebabel
,polib
and others didn't. See python/python-docs-es#2873, izimobil/polib#161 and https://git.afpy.org/AFPy/powrap/pulls/4 for further reference.While implementing this change I found that the
_NormalizedString
class not only was used to contain message lines, but also participated in the parsing process (and hid some parsing as well). I thus broke down my changes into three separate commits:_NormalizedString
class across the codebase (see details in commit)._NormalizedString
behavesAlong the way I also implemented three small quality-of-life changes. They are included as the first three commits of this PR, happy to submit these separately if required: