Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Partial parsing error at path ['raw_code']: None is not of type 'string' when modifying source if snapshot is snapping the source and snapshot is declared with yaml #11164

Open
2 tasks done
jeremyyeo opened this issue Dec 18, 2024 · 2 comments
Labels
bug Something isn't working partial_parsing snapshots Issues related to dbt's snapshot functionality

Comments

@jeremyyeo
Copy link
Contributor

jeremyyeo commented Dec 18, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

If we have a snapshot declared in the fancy new yaml way that is snapping a source - then partial parsing will error when the source is modified.

Expected Behavior

No error.

Steps To Reproduce

Project setup.

# dbt_project.yml
name: my_dbt_project
profile: all
version: "1.0.0"

# models/sources.yml
sources:
  - name: raw
    tables:
      - name: customers
        description: The customers table.

# snapshots/some_snapshot.yml
snapshots:
  - name: some_snapshot
    relation: source('raw', 'customers')
    config:
      strategy: timestamp
      unique_key: id
      updated_at: updated_at

Do an initial parse:

$ rm -rf target
$ dbt --debug parse

02:20:54  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x106c167d0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10914b450>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1091c59d0>]}
02:20:54  Running with dbt=1.9.0-rc2
02:20:54  running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'version_check': 'True', 'fail_fast': 'False', 'log_path': '/Users/jeremy/git/dbt-basic/logs', 'profiles_dir': '/Users/jeremy/.dbt', 'debug': 'True', 'use_colors': 'True', 'use_experimental_parser': 'False', 'empty': 'None', 'quiet': 'False', 'no_print': 'None', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'invocation_command': 'dbt --debug parse', 'introspect': 'True', 'log_format': 'default', 'target_path': 'None', 'static_parser': 'True', 'send_anonymous_usage_stats': 'True'}
02:20:55  Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'b4194e5c-bac2-44e1-a39a-8a6e4ff8715a', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x109168150>]}
02:20:55  Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'b4194e5c-bac2-44e1-a39a-8a6e4ff8715a', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x106c21610>]}
02:20:55  Registered adapter: postgres=1.9.0-rc1
02:20:55  checksum: cfb72516634404a3c61854d6d9543be0625a07f5ebfc68fc9626e92b77373532, vars: {}, profile: , target: , version: 1.9.0rc2
02:20:55  Unable to do partial parsing because saved manifest not found. Starting full parse.
02:20:55  Sending event: {'category': 'dbt', 'action': 'partial_parser', 'label': 'b4194e5c-bac2-44e1-a39a-8a6e4ff8715a', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10916b510>]}
02:20:55  Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'b4194e5c-bac2-44e1-a39a-8a6e4ff8715a', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x109fe1610>]}
02:20:55  Performance info: /Users/jeremy/git/dbt-basic/target/perf_info.json
02:20:55  Wrote artifact WritableManifest to /Users/jeremy/git/dbt-basic/target/manifest.json
02:20:55  Wrote artifact SemanticManifest to /Users/jeremy/git/dbt-basic/target/semantic_manifest.json
02:20:55  Resource report: {"command_name": "parse", "command_success": true, "command_wall_clock_time": 0.62943304, "process_in_blocks": "0", "process_kernel_time": 0.131493, "process_mem_max_rss": "119373824", "process_out_blocks": "0", "process_user_time": 1.067013}
02:20:55  Command `dbt parse` succeeded at 15:20:55.524272 after 0.63 seconds
02:20:55  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1091a8fd0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1091a9410>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104f28b50>]}
02:20:55  Flushing usage events
02:20:56  An error was encountered while trying to flush usage events

Edit the description of our source:

# models/sources.yml
sources:
  - name: raw
    tables:
      - name: customers
        description: No customers to be found.

Do a subsequent - i.e. partially parsed parse:

$ dbt --debug parse

02:21:21  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1092b1510>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1092cccd0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1092cd010>]}
02:21:21  Running with dbt=1.9.0-rc2
02:21:21  running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'profiles_dir': '/Users/jeremy/.dbt', 'version_check': 'True', 'warn_error': 'None', 'log_path': '/Users/jeremy/git/dbt-basic/logs', 'debug': 'True', 'fail_fast': 'False', 'use_colors': 'True', 'use_experimental_parser': 'False', 'empty': 'None', 'quiet': 'False', 'no_print': 'None', 'log_format': 'default', 'introspect': 'True', 'invocation_command': 'dbt --debug parse', 'static_parser': 'True', 'target_path': 'None', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'send_anonymous_usage_stats': 'True'}
02:21:21  Sending event: {'category': 'dbt', 'action': 'project_id', 'label': '20b74a2d-16b3-4d1c-93af-b9ba1b31fc00', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1092f8390>]}
02:21:21  Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': '20b74a2d-16b3-4d1c-93af-b9ba1b31fc00', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x109895050>]}
02:21:21  Registered adapter: postgres=1.9.0-rc1
02:21:21  checksum: cfb72516634404a3c61854d6d9543be0625a07f5ebfc68fc9626e92b77373532, vars: {}, profile: , target: , version: 1.9.0rc2
02:21:21  Partial parsing enabled: 0 files deleted, 0 files added, 1 files changed.
02:21:21  Unable to do partial parsing because an error occurred. Switching to full reparse.
02:21:21  Partial parsing exception processing file my_dbt_project://models/sources.yml
02:21:21  PP exception info: {'code': 'if not source_file.nodes:', 'parse_file_type': 'schema', 'exception': "AttributeError: 'SchemaSourceFile' object has no attribute 'nodes'", 'location': 'line 396 in remove_mssat_file', 'traceback': 'Traceback (most recent call last):\n  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.9.pre/lib/python3.11/site-packages/dbt/parser/manifest.py", line 533, in safe_update_project_parser_files_partially\n    project_parser_files = self.partial_parser.get_parsing_files()\n                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.9.pre/lib/python3.11/site-packages/dbt/parser/partial.py", line 181, in get_parsing_files\n    self.change_schema_file(file_id)\n  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.9.pre/lib/python3.11/site-packages/dbt/parser/partial.py", line 626, in change_schema_file\n    self.handle_schema_file_changes(saved_schema_file, saved_yaml_dict, new_yaml_dict)\n  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.9.pre/lib/python3.11/site-packages/dbt/parser/partial.py", line 692, in handle_schema_file_changes\n    self.delete_schema_source(schema_file, source)\n  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.9.pre/lib/python3.11/site-packages/dbt/parser/partial.py", line 905, in delete_schema_source\n    self.schedule_referencing_nodes_for_parsing(unique_id)\n  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.9.pre/lib/python3.11/site-packages/dbt/parser/partial.py", line 407, in schedule_referencing_nodes_for_parsing\n    self.schedule_nodes_for_parsing(self.saved_manifest.child_map[unique_id])\n  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.9.pre/lib/python3.11/site-packages/dbt/parser/partial.py", line 419, in schedule_nodes_for_parsing\n    self.remove_mssat_file(source_file)\n  File "/Users/jeremy/git/dbt-basic/venv_dbt_1.9.pre/lib/python3.11/site-packages/dbt/parser/partial.py", line 396, in remove_mssat_file\n    if not source_file.nodes:\n           ^^^^^^^^^^^^^^^^^\nAttributeError: \'SchemaSourceFile\' object has no attribute \'nodes\'\n'}
02:21:21  Sending event: {'category': 'dbt', 'action': 'partial_parser', 'label': '20b74a2d-16b3-4d1c-93af-b9ba1b31fc00', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x109f47b50>]}
02:21:21  Encountered an error:
Parsing Error
  at path ['raw_code']: None is not of type 'string'
02:21:21  Resource report: {"command_name": "parse", "command_success": false, "command_wall_clock_time": 0.575309, "process_in_blocks": "0", "process_kernel_time": 0.134157, "process_mem_max_rss": "116473856", "process_out_blocks": "0", "process_user_time": 1.024762}
02:21:21  Command `dbt parse` failed at 15:21:21.572985 after 0.58 seconds
02:21:21  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x109258750>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1092cf510>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1092cd850>]}
02:21:21  Flushing usage events
02:21:23  An error was encountered while trying to flush usage events

Workaround: Do a full parse whenever you change the source file by deleting the target folder or adding the --no-partial-parse flag. This is kinda cumbersome in the dbt Cloud IDE though - each save of a file will do a partial parse behind the scenes.

Relevant log output

No response

Environment

- OS: macOS
- Python: Python 3.11.9
- dbt: dbt-core==1.9.0-rc2 / dbt-postgres==1.9.0rc1

Which database adapter are you using with dbt?

postgres

Additional Context

This isn't an issue if we stuck to our good old sql file way of declaring snapshots:

# dbt_project.yml
name: my_dbt_project
profile: all
version: "1.0.0"

# models/sources.yml
sources:
  - name: raw
    tables:
      - name: customers
        description: The customers table.
-- snapshots/some_snapshot.sql
{% snapshot some_snapshot %}

    {{
        config(
          target_schema='snapshots',
          strategy='timestamp',
          unique_key='id',
          updated_at='updated_at',
        )
    }}

    select * from {{ source('raw', 'customers') }}

{% endsnapshot %}

Initial parse:

$ rm -rf target
$ dbt --debug parse

02:24:00  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10685e590>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1068aab90>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10684b450>]}
02:24:00  Running with dbt=1.9.0-rc2
02:24:00  running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'profiles_dir': '/Users/jeremy/.dbt', 'fail_fast': 'False', 'warn_error': 'None', 'log_path': '/Users/jeremy/git/dbt-basic/logs', 'debug': 'True', 'version_check': 'True', 'use_colors': 'True', 'use_experimental_parser': 'False', 'empty': 'None', 'quiet': 'False', 'no_print': 'None', 'log_format': 'default', 'static_parser': 'True', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'introspect': 'True', 'target_path': 'None', 'invocation_command': 'dbt --debug parse', 'send_anonymous_usage_stats': 'True'}
02:24:00  Sending event: {'category': 'dbt', 'action': 'project_id', 'label': '6ecb6c42-f6c6-4f82-b30b-944f1cc6a629', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1068c7450>]}
02:24:00  Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': '6ecb6c42-f6c6-4f82-b30b-944f1cc6a629', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10397d8d0>]}
02:24:00  Registered adapter: postgres=1.9.0-rc1
02:24:00  checksum: cfb72516634404a3c61854d6d9543be0625a07f5ebfc68fc9626e92b77373532, vars: {}, profile: , target: , version: 1.9.0rc2
02:24:00  Unable to do partial parsing because saved manifest not found. Starting full parse.
02:24:00  Sending event: {'category': 'dbt', 'action': 'partial_parser', 'label': '6ecb6c42-f6c6-4f82-b30b-944f1cc6a629', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1068fc650>]}
02:24:00  Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '6ecb6c42-f6c6-4f82-b30b-944f1cc6a629', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10744f310>]}
02:24:00  Performance info: /Users/jeremy/git/dbt-basic/target/perf_info.json
02:24:00  Wrote artifact WritableManifest to /Users/jeremy/git/dbt-basic/target/manifest.json
02:24:00  Wrote artifact SemanticManifest to /Users/jeremy/git/dbt-basic/target/semantic_manifest.json
02:24:00  Resource report: {"command_name": "parse", "command_success": true, "command_wall_clock_time": 0.6398744, "process_in_blocks": "0", "process_kernel_time": 0.156078, "process_mem_max_rss": "117719040", "process_out_blocks": "0", "process_user_time": 1.066741}
02:24:00  Command `dbt parse` succeeded at 15:24:00.837051 after 0.64 seconds
02:24:00  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1011f8b50>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x110042750>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x101131f10>]}
02:24:00  Flushing usage events
02:24:02  An error was encountered while trying to flush usage events

Modify source like above then reparse:

$ dbt --debug parse

02:24:30  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11125e210>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1112aa690>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1112bdd90>]}
02:24:30  Running with dbt=1.9.0-rc2
02:24:30  running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'profiles_dir': '/Users/jeremy/.dbt', 'version_check': 'True', 'warn_error': 'None', 'log_path': '/Users/jeremy/git/dbt-basic/logs', 'fail_fast': 'False', 'debug': 'True', 'use_colors': 'True', 'use_experimental_parser': 'False', 'empty': 'None', 'quiet': 'False', 'no_print': 'None', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'introspect': 'True', 'static_parser': 'True', 'log_format': 'default', 'target_path': 'None', 'invocation_command': 'dbt --debug parse', 'send_anonymous_usage_stats': 'True'}
02:24:30  Sending event: {'category': 'dbt', 'action': 'project_id', 'label': 'b4085125-d418-460e-a213-7c3c75862cf6', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1112e21d0>]}
02:24:30  Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': 'b4085125-d418-460e-a213-7c3c75862cf6', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10617d5d0>]}
02:24:30  Registered adapter: postgres=1.9.0-rc1
02:24:30  checksum: cfb72516634404a3c61854d6d9543be0625a07f5ebfc68fc9626e92b77373532, vars: {}, profile: , target: , version: 1.9.0rc2
02:24:30  Partial parsing enabled: 0 files deleted, 0 files added, 1 files changed.
02:24:30  Partial parsing: updated file: my_dbt_project://models/sources.yml
02:24:30  Sending event: {'category': 'dbt', 'action': 'load_project', 'label': 'b4085125-d418-460e-a213-7c3c75862cf6', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1116ba850>]}
02:24:30  Performance info: /Users/jeremy/git/dbt-basic/target/perf_info.json
02:24:30  Wrote artifact WritableManifest to /Users/jeremy/git/dbt-basic/target/manifest.json
02:24:30  Wrote artifact SemanticManifest to /Users/jeremy/git/dbt-basic/target/semantic_manifest.json
02:24:30  Resource report: {"command_name": "parse", "command_success": true, "command_wall_clock_time": 0.40844685, "process_in_blocks": "0", "process_kernel_time": 0.158333, "process_mem_max_rss": "119029760", "process_out_blocks": "0", "process_user_time": 0.845766}
02:24:30  Command `dbt parse` succeeded at 15:24:30.819544 after 0.41 seconds
02:24:30  Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1112be090>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1041fead0>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x104380bd0>]}
02:24:30  Flushing usage events
02:24:31  An error was encountered while trying to flush usage events
@jeremyyeo jeremyyeo added bug Something isn't working triage partial_parsing labels Dec 18, 2024
@H-Max
Copy link

H-Max commented Dec 20, 2024

Hello, confirmed the problem here. Disabling partial parsing fixed the issue.

@dbeatty10 dbeatty10 added the snapshots Issues related to dbt's snapshot functionality label Jan 10, 2025
@dbeatty10
Copy link
Contributor

I was also able to reproduce this using the following files.

Note: I only reproduced this parsing error when changing the description of a source -- doing the same thing when snapshotting a model didn't trigger any error.

Reprex

models/customers.sql

select 1 as id, {{ dbt.current_timestamp() }} as updated_at

models/_sources.yml

sources:
  - name: raw
    database: "{{ target.database }}"
    schema: "{{ target.schema }}"
    tables:
      - name: customers
        description: Some description.
        # description: Try to trigger parsing error.

snapshots/_snapshots.yml

snapshots:
  - name: source_snapshot
    relation: source('raw', 'customers')
    config:
      strategy: timestamp
      unique_key: id
      updated_at: updated_at

Run these commands:

  1. rm -rf target
  2. dbt parse
  3. Switch the description within models/_sources.yml
  4. dbt parse
  5. Get a at path ['raw_code']: None is not of type 'string' parsing error

@dbeatty10 dbeatty10 removed the triage label Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working partial_parsing snapshots Issues related to dbt's snapshot functionality
Projects
None yet
Development

No branches or pull requests

3 participants