-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[General Question on NN] Problem with Loss function and it's influence on backpropagation #86
Comments
@Memnarch, your questions are very good. In Keras, this is how the gradient is calculated (you won't find layer specific derivative calculations as in CAI and convnetjs): In CAI, you'll find explicit code for derivatives. I coded CAI with one screen on wikipedia (https://en.wikipedia.org/wiki/Backpropagation) and another on Lazarus. In convnetjs, you'll find the softmax forward and backward code at: I haven't checked for a while, but the implementation at CAI should be equivalent to the convnetjs:
Sometimes, in the literature, the error is called DELTA. In CAI, when you find a variable named OutputErrorDeriv, it means the error multiplied by the derivative of the output. In other APIs, you may find that the variable used for ERROR starts with the character "D" standing for "Delta". In short, for calculating the SOFTMAX error, it's just the difference from the desired output to the current output. It's easy to spend hours of meditation thinking that what drives the learning is an error. I hope it helps and may the source be with you. |
Wow, thanks for the response. I'll read through the docs once I got some sleep (1:40 am 😅) |
I can see that you classified a cat with 97% of probability. Please feel free to ask. If it's in the scope of CAI API or neural networks in general, I can try to reply. |
I read the doc regarding the gradient tape. really interesting, but I'd say not in scope for implementing in Delphi, sadly (for now). One topic I read about the other day was about the different methods of optimizing weights. SGD being what I do right now. Adam, however, seems to be a popular one. Looking into it, it said that it keeps track of past weights to influence future calculations. Something that did not seem to be explained explicitly was the scope of storing weights. My current gues is, within the Dense layer it is per "Neuron" and within a ConvolutionLayer it's per kernel (not filter). Do you have any clue if i am right? While weights and biases have different names, it seems, when it comes to value changes, they run through the pipeline at the same time. Which means Adam would take all 4 into it's calculation for later use. |
Hi,
This ticket isn't about your project, directly. However I started lookig into writing a (C)NN and got a bit stuck on something.
I started on this topic with this article collection(for context): https://victorzhou.com/series/neural-networks-from-scratch/
For backpropagation, someone needs to calculate the initial gradient that is feed into the Outputnode and proceed from there back to the first. If I understood this correctly, the initial gradient for Backpropagation is calculated using the Output of a feedforward and running it through the derivative of the loss function.
As I have a bit of a hard time deriving the loss functions myself, like binary cross entropy (i am really rusty on this topic), I thought I should peek around how other frameworks solve this. And right now I am confused. Frameworks like Keras allow you to write custom loss functions, but have no interface for delivering the derivative?
In your repository, I found the LossFN property on your network, but again, no sign for the derivative.
So either this means I got some fundamentals wrong, or I am blind.
I hope you have some time to for this. Sorry in advance for opening a ticket here and add to the noise.
The text was updated successfully, but these errors were encountered: