Skip to content

Commit

Permalink
(Obsidian) fix typo
Browse files Browse the repository at this point in the history
  • Loading branch information
GerHobbelt committed Nov 24, 2023
1 parent 4d3246f commit 8118192
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
* *possibly* **script** extraction / text-postprocessing to improve text extraction output quality, e.g. apply automatic / *suggested?* spell-checking --> this would be an advanced feature as we MAY want to keep both original and *edited="corrected"* text in the output.

> First idea was to markup or otherwise keep both original and edited text in the same file, but when we want to be able to *easily* use *external* professional-quality tools for comparisons & evaluation (such as Scooter Software's Beyond Compare), **the simplest way to get usable results is to output *two* text files**: one "raw" and one "post-processed", so that we can always see whether the "autocorrect" was actually correct or had just fubarred something rare/unknown to the spell-checker/corrector.
> **This would then more easily blend in with any subsequent human-user vetted editorial edits to the extracted text**: an ability that is currently NOT available, but which I've been desiring for a *long time* as this allows us to Mechanical Turk any 99% OCR result and lift it up into 100% correct (*vetted*) content: not a requirement for all of us, but something I need as this makes straight citing / quoting from the actual content far easier and thus much more usable: I'm personally not invested or interested in the plagiarism scare at some academia; *value* (in my case) is increased when I can directly quote relevant chunks of original reference content so readers don't have to bother with reading / scanning through the references: that's increasing efficiency in a *business research setting*, where verification of references is only important when you don't trust the information collector / writer of the report that you got from me: *efficiency* requires both *trusting the bearer of the news (**me**)* (& thus citing references and having them available on request (*Qiqqa library*!) is beneficial at that secondary level) and *fast perusal*, i.e. NOT having to wade though tens to thousands of extra referenced papers' pages in order to check I'm not pulling the citations out o my arse. Thus *efficiency* in *business reading* actually *benefits* from quoting chunks of text, which in a student/academia setting would be automatically as "plagiarism".
> **This would then more easily blend in with any subsequent human-user vetted editorial edits to the extracted text**: an ability that is currently NOT available, but which I've been desiring for a *long time* as this allows us to Mechanical Turk any 99% OCR result and lift it up into 100% correct (*vetted*) content: not a requirement for all of us, but something I need as this makes straight citing / quoting from the actual content far easier and thus much more usable: I'm personally not invested or interested in the plagiarism scare at some academia; *value* (in my case) is increased when I can directly quote relevant chunks of original reference content so readers don't have to bother with reading / scanning through the references: that's increasing efficiency in a *business research setting*, where verification of references is only important when you don't trust the information collector / writer of the report that you got from me: *efficiency* requires both *trusting the bearer of the news (**me**)* (& thus citing references and having them available on request (*Qiqqa library*!) is beneficial at that secondary level) and *fast perusal*, i.e. NOT having to wade though tens to thousands of extra referenced papers' pages in order to check I'm not pulling the citations out of my arse. Thus *efficiency* in *business reading* actually *benefits* from quoting chunks of text, which in a student/academia setting would be automatically flagged as "plagiarism".
>
...

0 comments on commit 8118192

Please sign in to comment.