v1.1.0
Starting today we are going to be posting weekly releases here and on firecrawl.dev/changelog. This release is just a summary of all the improvements and fixes we pushed since v1 release here. Thank you all for the contributions!
v1.1.0
Changelog Highlights
Feature Enhancements
- New Features:
- Geolocation, mobile scraping, 4x faster parsing, better webhooks,
- Credit packs, auto-recharges and batch scraping support.
- Iframe support and query parameter differentiation for URLs.
- Similar URL deduplication.
- Enhanced map ranking and sitemap fetching.
Performance Improvements
- Faster crawl status filtering and improved map ranking algorithm.
- Optimized Kubernetes setup and simplified build processes.
- Sitemap discoverability and performance improved
Bug Fixes
- Resolved issues:
- Badly formatted JSON, scrolling actions, and encoding errors.
- Crawl limits, relative URLs, and missing error handlers.
- Fixed self-hosted crawling inconsistencies and schema errors.
SDK Updates
- Added dynamic WebSocket imports with fallback support.
- Optional API keys for self-hosted instances.
- Improved error handling across SDKs.
Documentation Updates
- Improved API docs and examples.
- Updated self-hosting URLs and added Kubernetes optimizations.
- Added articles: mastering
/scrape
and/crawl
.
Miscellaneous
- Added new Firecrawl examples
- Enhanced metadata handling for webhooks and improved sitemap fetching.
- Updated blocklist and streamlined error messages.
What's Changed
- Add docs to api spec example by @ericciarla in #637
- [Docs] upgraded the path of the self-hosted documentation URL to
/v1
. by @shige in #635 - Removal of generic classnames/ids from onlyMainContent cleaning by @nickscamara in #638
- Improved team credits check and billing notifications by @nickscamara in #640
- Fixed 500 errors when JSON is badly formatted by @nickscamara in #648
- Better engine for wait + other params by @nickscamara in #649
- fix(py-sdk): removed asyncio package by @rafaelsideguide in #654
- perf(js-sdk): move
dotenv
anduuid
todevDependencies
, fixzod
import by @MonsterDeveloper in #614 - build(js-sdk): simplify build process by @MonsterDeveloper in #611
- fix(v0/crawl-status): don't crash on big crawls when requesting jobs from supa by @mogery in #653
- Manual Rate Limiter for select team ids by @nickscamara in #664
- O1 crawler example by @ericciarla in #676
- [Bug] Fixed screenshot typo and added test for fullpage screenshot by @rafaelsideguide in #677
- v1/map improvements + higher limits by @nickscamara in #674
- Remove print statement in map by @anjor in #612
- fix wrong link to self host documentation by @itasli in #623
- feat: kubernetes example optimization by @yekkhan in #639
- Rust SDK 1.0.0 by @mogery in #689
- feat: Actions by @mogery in #682
- Fix the error message when trying search in v0 by @nickscamara in #690
- remove space in the examples/o1_web_crawler folder name by @h4r5h4 in #679
- o1 job recommender example by @ericciarla in #707
- Move auth and check credits operations into an RPC by @mogery in #704
- bugfix: using onlyIncludeTags and removeTags together by @skeptrunedev in #685
- Concurrency limits by @mogery in #721
- Docs: Remove wait_until_done from python-sdk example by @bytrangle in #728
- Improves error handler in Node SDK to return the status code by @nickscamara in #727
- Fixes crawl failed and webhooks not working properly by @nickscamara in #731
- [BUG] Fixed URLs with params by @rafaelsideguide in #732
- Fixed the self host issues where methods don't work by @nickscamara in #733
- Make sure the entrypoint script has the correct line endings by @busaud in #753
- Rm cluster mode + rm fly deployments by @nickscamara in #754
- Fixed Issue #734 by @Harsh0707005 in #747
- bugfix: self-host crawling doesnt respect limit by @busaud in #755
- [BUG] Fixed missing error handling in JS-SDK by @rafaelsideguide in #759
- [SKD] Cancel Crawl by @rafaelsideguide in #760
- fixed developer.notion special case by @rafaelsideguide in #762
- Spelling Corrections in README by @fadkeabhi in #763
- [RPC] Improvements to credit_usage rpc by @nickscamara in #767
- [BUG] filters failed and unknown jobs now by @rafaelsideguide in #761
- [Doc] Better explained how includePaths and excludePaths work by @rafaelsideguide in #766
- Update README.md by @busaud in #757
- ADDED : Contributors and Back to top by @Ruhi14 in #768
- Retries for ACUC RPC + Price credits fallback by @nickscamara in #773
- [BUG] added check files on crawl by @rafaelsideguide in #779
- [Feat] Performance improvements crawl status filters by @rafaelsideguide in #780
- Admin alerts for high usage by @nickscamara in #783
- Geolocation support for Firecrawl by @nickscamara in #784
- Return all the website metadata by @nickscamara in #785
- Extractor options logging v1 fix by @nickscamara in #788
- Update requirements.txt by @rishi-raj-jain in #790
- Improved /map ranking algorithm for search queries by @nickscamara in #798
- Fix Typos and Grammar in
SELF_HOST.md
by @Mefisto04 in #799 - [Bug] encoding error for special token by @rafaelsideguide in #793
- [BUG-SDK] missing error in response by @rafaelsideguide in #796
- examples: sales web crawler by @rishi-raj-jain in #797
- feat: clear ACUC cache endpoint based on team ID by @mogery in #807
- feat: skipTlsVerification by @tomkosm in #808
- feat: Batch Scrape by @mogery in #789
- feat: Auto Recharge Credits + Credit Packs by @nickscamara in #809
- Remove ph logs for single_urls by @nickscamara in #829
- Bump to gemini-1.5-pro-002 website_qa_with_gemini_caching.ipynb and add flash example by @s-smits in #739
- Add SearchApi as a Web Search Tool by @SebastjanPrachovskij in #628
- RM wait before interacting by @nickscamara in #838
- chore(README.md): use
satisfies
instead ofas
for ts example by @twlite in #831 - Geo-location rename to location by @nickscamara in #830
- concurrency limit fix by @mogery in #824
- [feat] Iframe support by @tomkosm in #855
- Fix go parser by @tomkosm in #856
- Support for the 2 new actions by @nickscamara in #858
- Adds support for mobile web scraping + mobile screenshot by @nickscamara in #847
- [Feat] Added remove base64 images options (true by default) by @rafaelsideguide in #867
- [Fix] Prevent Python Firecrawl logger from interfering with loggers in client applications by @reasonmethis in #613
- [BUG] Added trycatch and removed redundancy by @rafaelsideguide in #869
- Update CONTRIBUTING.md by @swyxio in #849
WebScraper
refactor intoscrapeURL
by @mogery in #714- Exec js - actions by @nickscamara in #872
- feat(crawl): Similar URL deduplication by @mogery in #878
- [SDK] Added next handler for python sdk (js is ok) by @rafaelsideguide in #880
- [BUG] fixes scroll action by @rafaelsideguide in #881
- feat(crawl): add parameter to treat differing query parameters as different URLs by @mogery in #892
- fix(crawler): relative URL handling on non-start pages by @mogery in #893
- Redlock for sending email notifications by @nickscamara in #895
- feat(v1/webhook): complex webhook object w/ headers by @mogery in #899
- fix(sitemap): scrape with tlsclient by @mogery in #891
- Allows
/map
to only return links present in the sitemap by @nickscamara in #901 - Node SDK : Add Mobile Scraping by @ad-angelo in #914
- Add notebook and markdown files for two articles: mastering /scrape and mastering /crawl by @BexTuychiev in #918
- Extract (beta) by @nickscamara in #915
- Add a new project to examples that shows how to scrape Hacker News website by @BexTuychiev in #928
- Fix/while next loop by @rafaelsideguide in #939
- Crawl not respecting the limit when ignoreSitemap is false by @nickscamara in #940
- Crawl fixes: fixed the n-1 bug by @nickscamara in #941
- fixed keyerror for data on sdk by @rafaelsideguide in #943
- Submitting two articles to the blog by @BexTuychiev in #944
- removed microsoft from blocklist by @rafaelsideguide in #958
- Add assets for the Automated Amazon Price Tracking article by @BexTuychiev in #946
- Fixed Prettier by @nickscamara in #965
- Fixes schema base model extract by @rafaelsideguide in #954
- Metadata for webhooks by @nickscamara in #970
- Remove Block List by @ericciarla in #971
- feat(runWebScraper): retry a scrape max 3 times in a crawl if the status code is failure (FIR-293) by @mogery in #975
- feat: add scrapeOptions.fastMode (FIR-288) by @mogery in #973
- fix: adjust Playwright service response to match API schema expectations by @AsyncArtisan in #977
- Timeout fixes on user defined timeouts by @nickscamara in #978
- Default to pdf2md, if under 500 chars (indicating failure) use LlamaParse by @ericciarla in #984
- Revert to pdf parse by @ericciarla in #987
- [BUG] fixed title extra info by @rafaelsideguide in #996
- feat-SDK/added crawl id to ws by @rafaelsideguide in #994
- [SDK] Added try catch to ws message handler by @rafaelsideguide in #998
- Credit usage endpoint by @nickscamara in #999
- Refactored Blocklist Error Messages by @nickscamara in #1001
- feat(js-sdk): Make API key optional for self-hosted instances by @RutamBhagat in #989
- docs(CONTRIBUTING.md): Add Docker Compose setup instructions by @RutamBhagat in #1005
- feat: Support environments without ws by dynamically importing WebSocket module with error handling by @tomkosm in #997
- [bug/JS-SDK]Added check for object and trycatch as workaround for 502s by @rafaelsideguide in #974
- docs(credit-usage-api): add new endpoint documentation for credit usage by @RutamBhagat in #1003
- Improves sitemap fetching by @nickscamara in #1015
New Contributors
- @shige made their first contribution in #635
- @MonsterDeveloper made their first contribution in #614
- @anjor made their first contribution in #612
- @itasli made their first contribution in #623
- @yekkhan made their first contribution in #639
- @h4r5h4 made their first contribution in #679
- @skeptrunedev made their first contribution in #685
- @bytrangle made their first contribution in #728
- @busaud made their first contribution in #753
- @Harsh0707005 made their first contribution in #747
- @fadkeabhi made their first contribution in #763
- @Ruhi14 made their first contribution in #768
- @rishi-raj-jain made their first contribution in #790
- @Mefisto04 made their first contribution in #799
- @s-smits made their first contribution in #739
- @SebastjanPrachovskij made their first contribution in #628
- @twlite made their first contribution in #831
- @reasonmethis made their first contribution in #613
- @swyxio made their first contribution in #849
- @ad-angelo made their first contribution in #914
- @BexTuychiev made their first contribution in #918
- @AsyncArtisan made their first contribution in #977
- @RutamBhagat made their first contribution in #989
Full Changelog: v1.0.0...v1.1.0
We're gonna be updating the release every week now and will be updating it on firecrawl.dev/changelog too!