Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix parsing dbt ls outputs that contain JSONs that are not dbt nodes (#…
…1296) This change makes Cosmos more resilient, allowing it to be used even when JSONs do not represent dbt nodes in the `dbt ls` output. **Context** An Astronomer customer [raised a P1 incident](https://astronomer.zendesk.com/agent/tickets/67681), mentioning they could no longer run their Cosmos-powered DAGs. They were using Cosmos 1.5.0, and the issue was observed whenever DAGs were deployed using `Astro deploy --dags`, even if they only had whitespace as a difference. The DAGs could no longer be parsed, raising an exception similar to: ``` File /usr/local/lib/python3.11/site-packages/cosmos/dbt/graph.py, line 135, in parse_dbt_ls_output unique_id=node_dict[unique_id] KeyError: 'unique_id' ``` **Explanation** The customer recently changed their dbt project, adding print debug statements to one of their dbt macros. This caused the dbt ls output to contain lines that were valid JSON but were not valid dbt nodes, as observed in: ``` 11:20:43 Running with dbt=1.7.6 11:20:45 Registered adapter: bigquery=1.7.2 11:20:45 Unable to do partial parsing because saved manifest not found. Starting full parse. /***************************/ Values returned by mac_get_values: {} /***************************/ {"name": "some_model", "resource_type": "model", "package_name": "some_package", "original_file_path": "models/some_model.sql", "unique_id": "model.some_package.some_model", "alias": "some_model_some_package_1_8_0", "config": {"enabled": true, "alias": "some_model_some_package-1.8.0", "schema": "some_schema", "database": null, "tags": [], "meta": {}, "group": null, "materialized": "view", "incremental_strategy": null, "persist_docs": {}, "post-hook": [], "pre-hook": [], "quoting": {}, "column_types": {}, "full_refresh": null, "unique_key": null, "on_schema_change": "ignore", "on_configuration_change": "apply", "grants": {}, "packages": [], "docs": {"show": true, "node_color": null}, "contract": {"enforced": false, "alias_types": true}, "access": "protected"}, "tags": [], "depends_on": {"macros": [], "nodes": ["source.some_source"]}}""" ``` Cosmos didn't consider this use case. It assumed if a line was a JSON, it should be a dbt node: https://github.com/astronomer/astronomer-cosmos/blob/42a397fb40ff537c74bb6f596b4936815b14abbb/cosmos/dbt/graph.py#L161-L185 **Workaround** If customers updated the macro to print the information in a single line, they'd no longer observe the issue: ``` Values returned by mac_get_values: {} ``` We also released [1.5.0rc2](https://github.com/astronomer/astronomer-cosmos/releases/tag/astronomer-cosmos-v1.5.0rc2) with the change #1295, similar to the one introduced by this PR. **Fix** This change makes Cosmos more resilient to scenarios where `dbt ls` may output JSON lines that are not valid dbt nodes. It also logs those lines to help troubleshoot. We added a unit test to make sure we continue supporting this use-case.
- Loading branch information