-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve html entities and multiple spaces #59
base: master
Are you sure you want to change the base?
Conversation
This allows multiple sequential entities to still be multiple spaces, rather than getting collapsed. Within `code` blocks, neither a literal space nor a work, so a unicode nbsp char is used which seems to work in many markdown renderers. This fixes the output of the google doc code section.
Hey, just checking on this. Wondering if this is merge-able or if anything should be changed? |
Sorry, somehow this get lost in the shuffle. I don't think most users of a program like html2text want HTML in their output, so I'm not comfortable merging a patch that will cause HTML to appear in the output by default. What's your motivation here? |
In the first commit, HTML entities are used so that if your source HTML content is about HTML tags and entities, they will stay escaped and not "devolve" to actual tags and entites. For example The second commit doesn't add HTML to the markdown output. The third commit preserves My overall rationale for this is that we're importing a large amount of content into a markdown-based system, so we want to maintain accuracy to the original content. Specifically, we're using this within SourceForge as we upgrade projects from our legacy platform to our new platform. Lots of SourceForge forums and ticket content is technical, so there are literal HTML entities we need to preserve, as well as code snippets that have lines indented with many spaces (consecutive entities). Thanks |
Support for image sizing using raw html Thanks @smblackburn
A few commits to address preservation of html entities and multiple spaces, and fix general escaping that occurs with
backticks
. More details in commit messages