Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove system prompt from data generation #96

Open
oindrillac opened this issue Jul 8, 2024 · 4 comments
Open

Remove system prompt from data generation #96

oindrillac opened this issue Jul 8, 2024 · 4 comments
Labels
refactor Same results, different method

Comments

@oindrillac
Copy link
Contributor

Remove system prompt from data generation and will be re-introduced in the mixing phase.

_SYS_PROMPT = "You are an AI language model developed by IBM Research. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior."

@shivchander
Copy link
Member

+1, would be good to introduce system role during the data mixing phase which prepares the dataset for training - this makes it a tad bit cleaner to understand - as the system role is only applicable to training

@nathan-weinberg nathan-weinberg added the refactor Same results, different method label Aug 20, 2024
Copy link

This issue has been automatically marked as stale because it has not had activity within 90 days. It will be automatically closed if no further activity occurs within 30 days.

@github-actions github-actions bot added the stale label Nov 23, 2024
@bbrowning
Copy link
Contributor

Unmarking as stale, because this is still relevant and as we more cleanly separate data generation from mixing that would be the time to ensure we aren't passing around system prompts during generation.

@bbrowning bbrowning removed the stale label Dec 5, 2024
jwm4 pushed a commit to jwm4/sdg that referenced this issue Dec 13, 2024
…actions/rojopolis/spellcheck-github-actions-0.38.0

Bump rojopolis/spellcheck-github-actions from 0.37.0 to 0.38.0
@bbrowning
Copy link
Contributor

We're closer to removing this, but the system prompt is still used for the legacy test_*.jsonl file we generate as well as adding to all samples during the data mixing phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Same results, different method
Projects
None yet
Development

No branches or pull requests

4 participants