From 3b301225e2446d5777dc4962e59355e0cae12007 Mon Sep 17 00:00:00 2001 From: Partho Date: Fri, 21 Apr 2023 05:41:14 +0530 Subject: [PATCH] fix outdated link and shape typo in main demo (#251) * fix dead link * fix shape type --- demos/Main_Demo.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/demos/Main_Demo.ipynb b/demos/Main_Demo.ipynb index c438cd9c6..1fccd83c4 100644 --- a/demos/Main_Demo.ipynb +++ b/demos/Main_Demo.ipynb @@ -537,7 +537,7 @@ "The IOI task is the task of identifying that a sentence like \"After John and Mary went to the store, Mary gave a bottle of milk to\" continues with \" John\" rather than \" Mary\" (ie, finding the indirect object), and Redwood Research have [an excellent paper studying the underlying circuit in GPT-2 Small](https://arxiv.org/abs/2211.00593).\n", "\n", "**[Activation patching](https://dynalist.io/d/n2ZWtnoYHrU1s4vnFSAQ519J#z=qeWBvs-R-taFfcCq-S_hgMqx)** is a technique from [Kevin Meng and David Bau's excellent ROME paper](https://rome.baulab.info/). The goal is to identify which model activations are important for completing a task. We do this by setting up a **clean prompt** and a **corrupted prompt** and a **metric** for performance on the task. We then pick a specific model activation, run the model on the corrupted prompt, but then *intervene* on that activation and patch in its value when run on the clean prompt. We then apply the metric, and see how much this patch has recovered the clean performance. \n", - "(See [a more detailed demonstration of activation patching here](https://colab.research.google.com/github/neelnanda-io/Easy-Transformer/blob/main/Exploratory_Analysis_Demo.ipynb#scrollTo=5nUXG6zqmd0f))" + "(See [a more detailed demonstration of activation patching here](https://colab.research.google.com/github.com/neelnanda-io/TransformerLens/blob/main/demos/Exploratory_Analysis_Demo.ipynb))" ] }, { @@ -1167,7 +1167,7 @@ "* The weights (W_K, W_Q, W_V) mapping the residual stream to queries, keys and values are 3 separate matrices, rather than big concatenated one.\n", "* The weight matrices (W_K, W_Q, W_V, W_O) and activations (keys, queries, values, z (values mixed by attention pattern)) have separate head_index and d_head axes, rather than flattening them into one big axis.\n", " * The activations all have shape `[batch, position, head_index, d_head]`\n", - " * W_K, W_Q, W_V have shape `[head_index, d_head, d_model]` and W_O has shape `[head_index, d_model, d_head]`\n", + " * W_K, W_Q, W_V have shape `[head_index, d_model, d_head]` and W_O has shape `[head_index, d_head, d_model]`\n", "\n", "The actual code is a bit of a mess, as there's a variety of Boolean flags to make it consistent with the various different model families in TransformerLens - to understand it and the internal structure, I instead recommend reading the code in [CleanTransformerDemo](https://colab.research.google.com/github/neelnanda-io/TransformerLens/blob/clean-transformer-demo/Clean_Transformer_Demo.ipynb)" ]