v0.1.9
This Sycamore release adds improved heuristics for partitioning documents. It also includes a new method of automatically inferring entities to extract from unstructured documents, as well as incremental features and bug fixes.
What's Changed
- Change the default merge size to 256. by @eric-anderson in #178
- Simplify running the http crawler. by @eric-anderson in #180
- Fix text chunking for html importing to improve result quality. by @eric-anderson in #185
- Remove docker_compose and opensearch files. They were moved to quickstart. by @eric-anderson in #183
- Change simple_ingest and s3_ingest to use GTE-small embedding model. by @alexaryn in #169
- Remove unneeded mapping in OpenSearch index settings. by @alexaryn in #186
- Added HTML ingest example. Fixed order in S3 ingester. by @alexaryn in #188
- Simple transform to perform regex replacement on Elements. by @alexaryn in #187
- Update README.md by @jonfritz in #179
- Entity Extraction by @mkyl in #161
- Merging/breaking elements based on heuristics including bbox by @alexaryn in #171
- Update aiohttp and cryptography to address dependabot alerts. by @bsowell in #192
- Bump version to v0.1.9. by @bsowell in #191
New Contributors
Full Changelog: v0.1.8...v0.1.9