This Sycamore release contains several bug fixes and improvements.
What's Changed
- Add logging of the full exception in base_writer. by @eric-anderson in #1069
- Fix create_element to not crash on bad element types by @eric-anderson in #1070
- Add docset.take_stream() by @baitsguy in #1071
- Make temporary fix to
split_elements
to avoid exceeding recursion depth due to certain table elements by @MarkLindblad in #1073 - add TableMerger to merge elements docs by @HenryL27 in #1074
- Increase max recursion depth for
split_element
'ssplit_one
by @MarkLindblad in #1075 - Merge-elements-LLM-filter by @dhruvkaliraman7 in #1076
- Add support for GPU to similarity. by @austintlee in #999
- Tolerate bad entity extraction. by @eric-anderson in #1078
- move deformable detr safe loading code by @HenryL27 in #1055
- Allow Doc reconstruct via function by @austintlee in #1072
- Add-tokenizer-and-reranking-to-LLM-ExtractEntity by @dhruvkaliraman7 in #1081
- Schema object + entity extraction support by @baitsguy in #1083
- Make ttviz.cpp compile again. by @alexaryn in #1082
- Keep newline in OpenAI Embedder by @dhruvkaliraman7 in #1086
- Changed the default embedding model to openai. by @akarshgupta7 in #1087
- Add Embed at Element Level by @dhruvkaliraman7 in #1084
- Get sycamore.query to work with Schema instead of only OpenSearchSchema by @baitsguy in #1088
- Add hybrid table extractor by @HenryL27 in #1089
- Add map reduce style summarize to handle large texts for summarization. by @austintlee in #1079
- fix max(nothing) bug by @HenryL27 in #1091
- Delay initializing openai client in embedder by @HenryL27 in #1092
- fix materialize on windows by @HenryL27 in #1093
- Add Retries for OpenSearch Writer by @karanataryn in #1085
- Property extraction type cast by @baitsguy in #1095
- Revert overzealous no-rootification by @HenryL27 in #1098
- Add support for Anthropic LLMs. by @bsowell in #1096
- Fix similarity assert condition for LLM Filter by @dhruvkaliraman7 in #1099
- Raise PartitionError with explicit status code. by @alexaryn in #1101
- Add
PartitionError
toaryn_sdk.partition
's__init__.py
by @MarkLindblad in #1102 - Prompt update for property extraction by @baitsguy in #1103
- Add support for parallel read in OpenSearchReader by @austintlee in #1100
- Fix No Root Repetition in Test File by @karanataryn in #1097
- Bump version to 0.1.30. by @bsowell in #1109
Full Changelog: v0.1.29...v0.1.30