Skip to content

Commit

Permalink
fix outdated link and shape typo in main demo (#251)
Browse files Browse the repository at this point in the history
* fix dead link

* fix shape type
  • Loading branch information
daspartho authored Apr 21, 2023
1 parent 6eca22f commit 3b30122
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions demos/Main_Demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -537,7 +537,7 @@
"The IOI task is the task of identifying that a sentence like \"After John and Mary went to the store, Mary gave a bottle of milk to\" continues with \" John\" rather than \" Mary\" (ie, finding the indirect object), and Redwood Research have [an excellent paper studying the underlying circuit in GPT-2 Small](https://arxiv.org/abs/2211.00593).\n",
"\n",
"**[Activation patching](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=qeWBvs-R-taFfcCq-S_hgMqx)** is a technique from [Kevin Meng and David Bau's excellent ROME paper](https://rome.baulab.info/). The goal is to identify which model activations are important for completing a task. We do this by setting up a **clean prompt** and a **corrupted prompt** and a **metric** for performance on the task. We then pick a specific model activation, run the model on the corrupted prompt, but then *intervene* on that activation and patch in its value when run on the clean prompt. We then apply the metric, and see how much this patch has recovered the clean performance. \n",
"(See [a more detailed demonstration of activation patching here](https://colab.research.google.com/github/neelnanda-io/Easy-Transformer/blob/main/Exploratory_Analysis_Demo.ipynb#scrollTo=5nUXG6zqmd0f))"
"(See [a more detailed demonstration of activation patching here](https://colab.research.google.com/github.com/neelnanda-io/TransformerLens/blob/main/demos/Exploratory_Analysis_Demo.ipynb))"
]
},
{
Expand Down Expand Up @@ -1167,7 +1167,7 @@
"* The weights (W_K, W_Q, W_V) mapping the residual stream to queries, keys and values are 3 separate matrices, rather than big concatenated one.\n",
"* The weight matrices (W_K, W_Q, W_V, W_O) and activations (keys, queries, values, z (values mixed by attention pattern)) have separate head_index and d_head axes, rather than flattening them into one big axis.\n",
" * The activations all have shape `[batch, position, head_index, d_head]`\n",
" * W_K, W_Q, W_V have shape `[head_index, d_head, d_model]` and W_O has shape `[head_index, d_model, d_head]`\n",
" * W_K, W_Q, W_V have shape `[head_index, d_model, d_head]` and W_O has shape `[head_index, d_head, d_model]`\n",
"\n",
"The actual code is a bit of a mess, as there's a variety of Boolean flags to make it consistent with the various different model families in TransformerLens - to understand it and the internal structure, I instead recommend reading the code in [CleanTransformerDemo](https://colab.research.google.com/github/neelnanda-io/TransformerLens/blob/clean-transformer-demo/Clean_Transformer_Demo.ipynb)"
]
Expand Down

0 comments on commit 3b30122

Please sign in to comment.