-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
+/- sign in backward phase of the dice layer #32
Comments
the answers are:
yes. (it’s just a trick to speed things up a little. it pushes in the right direction both according to background and foreground.)
yes. (minimising -dice)
… On 21 Apr 2017, at 19:27, ntajbakhsh ***@***.***> wrote:
Hi Fausto,
I wonder why you have a "+=" operator for the feature map 0 (background), but a "-=" operator for the feature map 1 (foreground).
bottom[btm].diff[i, 0, :] += 2.0 * ( (self.gt[i, :] * self.union[i]) / ((self.union[i]) ** 2) - 2.0*prob[i,1,:]*(self.intersection[i]) / ((self.union[i]) ** 2))
bottom[btm].diff[i, 1, :] -= 2.0 * ( (self.gt[i, :] * self.union[i]) / ((self.union[i]) ** 2) - 2.0*prob[i,1,:]*(self.intersection[i]) / ((self.union[i]) ** 2))
Also, dice is supposed to get maximized during the training process, but it appears that it is treated as a loss metric in your code (code seems to attempt to minimize it).
Could you please help me with the above 2 questions?
Nima
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#32>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMtsvgjCeOY5tPPzNf-GqTD5_Z_NeRlLks5ryTt_gaJpZM4NE5Di>.
|
Thanks for the prompt reply! Theoretically, both operators should be +=, right? I mean the dice metric is class agnostic, so I assume the operator should be the same for both foreground and background, or I'm missing something? Could you point out where in the code you tell it to minimize the (- dice)? In the forward pass, you compute +dice, and I don't see it in the backward phase either. |
in caffe one must write own derivatives. therefore if you take the derivative of dice and you add a minus in front of it you get the derivative of -dice. actually i think i recall that caffe is optimising always trying to find a minimum. therefore in this case we should give in the derivative of dice. which we do.
the fact that in the forward pass i’m displaying the dice does not matter. i could display something completely arbitrary in the forward pass and still optimise correctly if my derivatives are correct in the backward pass.
of course, this only if other subsequent layers are not dependent on the forward pass of this layer :P which is in this case is not an issue (last layer).
The code is actually not that incredibly well written. if this was done in tensor flow for example, auto differentiation would force you to have everything consistent/correct.
… On 21 Apr 2017, at 19:56, ntajbakhsh ***@***.***> wrote:
Thanks for the prompt reply!
Theoretically, both operators should be +=, right? I mean the dice metric is class agnostic, so I assume the operator should be the same for both foreground and background, or I'm missing something?
Could you point out where in the code you tell it to minimize the (- dice)? In the forward pass, you compute +dice, and I don't see it in the backward phase either.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#32 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AMtsvtQihw7WC-dGvyARvlDEWqCG97faks5ryUIogaJpZM4NE5Di>.
|
Thanks. So, -= operator is acting as multiplying the gradient with -1, which should have been done for both bottom[btm].diff[i, 0, :] and bottom[btm].diff[i, 1, :], but you have intentionally not done for the background as a trick to speed up the training process, is my understanding correct? |
Actually for the background I'm just trying to make it do the opposite of what the foreground is doing, intuitively.
One even better thing to do could be to write the soft max only for one channel (foreground) and take the derivative of that expression, together with the soft max (all in 1 layer). I think that would be better under all points of view.
Fausto Milletarì
Sent from my iPhone
… On 21. Apr 2017, at 20:29, ntajbakhsh ***@***.***> wrote:
Thanks. So, -= operator is acting as multiplying the gradient with -1, which should have been done for both bottom[btm].diff[i, 0, :] and bottom[btm].diff[i, 1, :], but you have intentionally not done for the background as a trick to speed up the training process, is my understanding correct?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
All your explanations started to make sense after I re-read the paper and did the math by myself. Thanks! |
Hi Fausto, Firstly, thanks for open-sourcing the entire implementation of dice loss. It was very helpful. However, I am facing a few issues in understanding it. I am unable to understand why you added the minus sign "-=" for the feature map 1 (foreground) in the backward pass of the Dice loss layer in python.
Ideally, we are supposed to be maximizing the Dice score or minimize the -(Dice score). The derivative in the above formula is actually doing the opposite of maximizing the dice score. It is minimizing the dice score. Whenever we compute derivative of a function with respect to an input it always point us to the direction of increase in the functions value. Therefore multiplying the derivative by a "-" sign will point us in the direction of minimizing it. The gradient of the dice score can be calculated by the following formula: That is the direction of increasing Dice score. When we add a "-=" sign in the the above code lines we are forcing the network to minimize the dice score. Please help me understand where I am going wrong. Thanks, |
@faustomilletari @ntajbakhsh according to this line for binary segmentation with Blob shape with 2 feature maps, it is clear how to do that, what about generalized dice loss? because I have 4 feature maps for foreground classes. How can I calculate the diff for the background here? Your expert opinion is really appreciated |
Hi everyone, I really need help with this, even I have gone through different implentation and as the user has mentioned here, however, I am not sure if my implementation is correct or not. The values of loss is between 0 and 1, and at some points becomes 1. I implemented the loss explained in this reference, "Tversky loss, a generalized form of dice loss, which is identical to dice loss when alpha=beta=0.5", I created the one-hot encoding of ground truth, and saved the gradients into two tensors,
My questions are:
Your help is really appreciate, |
Hi Fausto,
I wonder why you have a "+=" operator for the feature map 0 (background), but a "-=" operator for the feature map 1 (foreground).
bottom[btm].diff[i, 0, :] += 2.0 * ( (self.gt[i, :] * self.union[i]) / ((self.union[i]) ** 2) - 2.0*prob[i,1,:]*(self.intersection[i]) / ((self.union[i]) ** 2))
bottom[btm].diff[i, 1, :] -= 2.0 * ( (self.gt[i, :] * self.union[i]) / ((self.union[i]) ** 2) - 2.0*prob[i,1,:]*(self.intersection[i]) / ((self.union[i]) ** 2))
Also, dice is supposed to get maximized during the training process, but it appears that it is treated as a loss metric in your code (code seems to attempt to minimize it).
Could you please help me with the above 2 questions?
Nima
The text was updated successfully, but these errors were encountered: