You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 12, 2024. It is now read-only.
At 1601 UTC on 15 May Plutus-chain-index stopped synching with mainnet which prevented the publication of the Orcfax ADA-USD feed. Despite extensive investigation, the Orcfax team was unable to revive the software or the mainnet feed.
Status
Resolved
Assessment
Plutus-chain-index is part of Intersect: https://github.com/IntersectMBO/plutus-apps. The component was deeply embedded in the v0 Orcfax Architecture as part of COOP. Shortly after COOP's release, the chain-index component was abandoned by IOG. The Orcfax team was aware of this and identified the component as one of the system's biggest risks during mainnet release of v0. In response, the team began work on v1 architecture which removed the component and was within weeks of transitioning to v1 when the component failed.
Orcfax attempted to triage the component for the first 48 hours immediately after the failure but was unable to revive it. During that time a decision was made to contact known integrators to ask that they move to their fall-back oracle services until more progress could be made by Orcfax.
The team continued to work on the component for the following week in an effort to better understand the failure, but nothing conclusive was ascertained. While the chain-index is still synching from COOP's genesis point in September 2024 at the time of writing, a decision was made to drop support for Orcfax's v0 oracle and continue with efforts to bring a v1 oracle to the Cardano chain.
Additional Notes
Specific errors
After starting plutus-chain-index with the --verbose parameter, it executes a few queries, then it hangs at this query:
[chain-index:Debug:40] [2024-05-25 07:56:33.62 UTC] {"contents":"UPDATE \"unspent_outputs\" SET \"output_row_tip__row_slot\"=? WHERE (\"output_row_tip__row_slot\")<(?);\n-- With values: [SQLInteger 124179884,SQLInteger 124179884]","tag":"BeamLogItem"}
Manually running the following query returns an error:
sqlite> UPDATE unspent_outputs SET output_row_tip__row_slot = 124179884 WHERE output_row_tip__row_slot < 124179884;
Error: stepping, UNIQUE constraint failed: unspent_outputs.output_row_tip__row_slot, unspent_outputs.output_row_out_ref (19)
sqlite>
It appears that the plutus-chain-index component is hanging because it gets caught in a loop whereby it retries the same query over and over again. The Orcfax team does not have the skills internally to perform more detailed analysis of plutus-chain-index.
Specific issues
We are unaware of any Cardano ecosystem events which may have precipitated the failure of this component.
Rebuilding the database was not an option as it initially required ~2 weeks+ to rebuild from 55% sync.
Orcfax Mistakes
plutus-chain-index was the kernel of COOP v0 and at such a low-level as to sit well outside of the team's expertise. This created significant, and realized, risk; going forward, Orcfax will be more mindful regarding the technologies utilized within the solution architecture.
While the team believed it had a backup prepared, Orcfax services were not taking snapshots of the database mount which rendered the backup unusable.
The delays experienced in the MLabs development impacted the rollout of COOP v1, and meant that we weren't ready to go-live with a new version in-line with our own schedules. A more iterative approach to development might have seen the chain-index component released with a new node > 1.34.5 to stabilize the current protocol before building additional features.
Impact mitigation
The impact of this event has been mitigated by key factors:
Since late 2023, prospective integrators were informed of the forthcoming V1 architecture and encouraged to wait for its release as necessary optimization of the datum schema would result in breaking changes for smart contracts.
Monitoring of Orcfax UTxO usage was deployed prior to this issue, which allowed the team to assess the impact of this outage on the community; those who were using the Orcfax feed were notified promptly and back-up solutions were activated.
Technical improvements
The Orcfax v0 solution which utilized this component has been retired. Work on the v1 architecture continues and the team continues to engage in dialogue with integrators as to when they will begin their integrations of the new datum.
We are investigating:
In the v1 solution we are using off-the-shelf components with proven support in the Cardano community such as Kupo and Ogmios.
Backup procedures will be investigated so as to ensure any indexes of significant size will have secondary copies available to us.
Recovery procedures will be investigated and time to launch will be reduced with a paradigm shift meaning the Oracle dApp will mostly be looking at the tip of the Cardano chain versus its entire history as was in the COOP solution.
Documentation improvements
Historical policy data will be maintained by Orcfax and instructions provided how to access historical archival packages on Arweave.
With the deprecation of the v0 protocol this repository will be closed and the lessons learned gathered and input into the v1 project. A new incidents repository will be opened with more transparent access to issues via the Orcfax Explorer and documentation pages.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
INCIDENT 035 | Failure of Plutus Chain Indexer
Trigger
Date
2024-05-15
Summary
At 1601 UTC on 15 May Plutus-chain-index stopped synching with mainnet which prevented the publication of the Orcfax ADA-USD feed. Despite extensive investigation, the Orcfax team was unable to revive the software or the mainnet feed.
Status
Resolved
Assessment
Plutus-chain-index is part of Intersect: https://github.com/IntersectMBO/plutus-apps. The component was deeply embedded in the v0 Orcfax Architecture as part of COOP. Shortly after COOP's release, the chain-index component was abandoned by IOG. The Orcfax team was aware of this and identified the component as one of the system's biggest risks during mainnet release of v0. In response, the team began work on v1 architecture which removed the component and was within weeks of transitioning to v1 when the component failed.
Orcfax attempted to triage the component for the first 48 hours immediately after the failure but was unable to revive it. During that time a decision was made to contact known integrators to ask that they move to their fall-back oracle services until more progress could be made by Orcfax.
The team continued to work on the component for the following week in an effort to better understand the failure, but nothing conclusive was ascertained. While the chain-index is still synching from COOP's genesis point in September 2024 at the time of writing, a decision was made to drop support for Orcfax's v0 oracle and continue with efforts to bring a v1 oracle to the Cardano chain.
Additional Notes
Specific errors
After starting plutus-chain-index with the
--verbose
parameter, it executes a few queries, then it hangs at this query:Manually running the following query returns an error:
It appears that the plutus-chain-index component is hanging because it gets caught in a loop whereby it retries the same query over and over again. The Orcfax team does not have the skills internally to perform more detailed analysis of plutus-chain-index.
Specific issues
We are unaware of any Cardano ecosystem events which may have precipitated the failure of this component.
Rebuilding the database was not an option as it initially required ~2 weeks+ to rebuild from 55% sync.
Orcfax Mistakes
> 1.34.5
to stabilize the current protocol before building additional features.Impact mitigation
The impact of this event has been mitigated by key factors:
Technical improvements
The Orcfax v0 solution which utilized this component has been retired. Work on the v1 architecture continues and the team continues to engage in dialogue with integrators as to when they will begin their integrations of the new datum.
We are investigating:
Documentation improvements
The text was updated successfully, but these errors were encountered: