Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setting multi datasets with sample weights while trainning #126

Open
derby-ding opened this issue Dec 25, 2024 · 1 comment
Open

setting multi datasets with sample weights while trainning #126

derby-ding opened this issue Dec 25, 2024 · 1 comment

Comments

@derby-ding
Copy link

Hi guys.
How to set multi datasets with sample weights while trainning? In official alphafold3, they use pdb-bank and distilled protein dataset, and set the sampling weights to 0.5, 0.495 ... In this work, I found target_dir in structure.ymal, but it seems not support a path list object. Need some help...

@gcorso
Copy link
Collaborator

gcorso commented Dec 27, 2024

Here is an example of how to use the config to set multiple datasets:

    - _target_: foldeverything.task.train.data.DatasetConfig
      target_dir: pdb_path
      msa_dir: msa_pdb_path
      prob: 0.5
      sampler:
        _target_: boltz.data.sample.cluster.ClusterSampler
      cropper:
        _target_: boltz.data.crop.boltz.BoltzCropper
        min_neighborhood: 0
        max_neighborhood: 40
      split: ./scripts/train/assets/validation_ids.txt
    - _target_: foldeverything.task.train.data.DatasetConfig
      target_dir: distillation_path
      msa_dir: msa_distillation_path
      prob: 0.5
      sampler:
        _target_: boltz.data.sample.cluster.ClusterSampler
      cropper:
        _target_: boltz.data.crop.boltz.BoltzCropper
        min_neighborhood: 0
        max_neighborhood: 40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants