Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data guidance on training data for AI/ML #19

Open
RPaseka opened this issue May 8, 2023 · 1 comment
Open

Data guidance on training data for AI/ML #19

RPaseka opened this issue May 8, 2023 · 1 comment
Milestone

Comments

@RPaseka
Copy link
Collaborator

RPaseka commented May 8, 2023

Develop guidance on sharing training data, including:

  • build on existing SPD-41a FAQ on this topic
  • Training data is in scope of SPD-41a, especially if needed to validate the results of a scientific finding (e.g., would need training data used to reproduce findings resulting from AI/ML models).
  • In general, you should provide all training data. However, there are considerations (list examples) for why it would not be appropriate to share complete training data. In that case, what can you share? Work with Manil on this.
  • Recommendations on how/where to share training data (repository selection), and considerations based on size of training dataset
  • commercial data used for training and implications for sharing
  • Examples of how training data are being shared openly - ESDS ACCESS projects - work with Cerese and Manil on this.
@RPaseka
Copy link
Collaborator Author

RPaseka commented May 30, 2023

Demitri to take first pass, working with Manil.

@nasacrawford nasacrawford added this to the v3 milestone Jun 13, 2023
@RPaseka RPaseka assigned RPaseka and unassigned RPaseka Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants