You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just noticed this while documenting dataset formats in #236
In _gen_train_data() we are taking a dataset with question and response columns, and generating a training dataset in two different formats. (Ok, in the case of the simple pipeline, we actually parse question and response from output, but that's not super important here)
If the dataset also contains a context column, we append context to the question.
In the full pipeline for grounded skills, we do generate this context column based on the seed_context column.
In the simple pipeline for grounded skills, we are not including a context column at all. I suspect the intent was include the original seed context in each sample? If so, we'd need to add a DuplicateColumnsBlock that would copy seed_context to context?
The text was updated successfully, but these errors were encountered:
Just noticed this while documenting dataset formats in #236
In
_gen_train_data()
we are taking a dataset withquestion
andresponse
columns, and generating a training dataset in two different formats. (Ok, in the case of thesimple
pipeline, we actually parsequestion
andresponse
fromoutput
, but that's not super important here)If the dataset also contains a
context
column, we appendcontext
to thequestion
.In the full pipeline for grounded skills, we do generate this
context
column based on theseed_context
column.In the simple pipeline for grounded skills, we are not including a
context
column at all. I suspect the intent was include the original seed context in each sample? If so, we'd need to add aDuplicateColumnsBlock
that would copyseed_context
tocontext
?The text was updated successfully, but these errors were encountered: