-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discussion about "pretraining" #130
Comments
Hello @mikerabat , Regarding "A colleague of mine made the comment I should pretraining the models to yield more robust models and Regarding "He hinted that one should first train the dataset differently... basically use a few convolutional layers to After training the autoencoder, the "decoder" is then removed and the NN is trained for classification. Although I'm skeptical about a single solution that works well for all problems, if you google for solutions using transfer learning and autoencoding, you'll find it. From experience, speculating "this will improve" is easy. If it really improves after days of work and retraining for a particular application, it's completely different. I think that it comes to the question what is the accuracy that you need and how much you intend to spend improving neural models. In the case that your colleague is certain about what he is saying, you could ask for the actual scientific papers (blog posts/web pages are not sufficient scientific evidence) and why does he believe that these solutions are applicable to your use case. I did experimentation myself comparing some convolutional layers with a PCA output and I found high similarity among convolution and PCA except for the activation function. If you like, I can code an example using an autoencoder for image classification. |
Dear Joaopaulo! Thanks for the valuable input! The data is here currently organized as So the idea was to first do some pretraining - then split of the Encoder part and use these as "first stage" in So here I am currently (stripped version)
|
Dear @mikerabat ! Feel free to use 512 input channels (instead of 3 for RGB) and beyond if you need. I have already used more than 512 channels with hyperspectral images. In the following code:
I would decrease the filter size to 3 (as per https://medium.com/@siddheshb008/vgg-net-architecture-explained-71179310050f ) .
Using While decoding, use the same filter sizes and padding of the encoding. I would first code a standard image classifier and use it as a baseline. When you come with a more or less good architecture, you can use it as a benchmark to compare against the improvement to be made with the encoder/decoder architecture. I'll code an example showing how to load just the encoder of a trained neural network and post here along weekend. |
Doubling the filter size at each maxpool or stride is a good idea. |
I would use fully connected layers with maxpool only for the actual image classification (not for the autoencoder). |
Dear Jaopaulo! MANY thanks for the valuabel input! First I need to deal with ECG so... 1d signals with at max up to 12 channels (we mostly deal with 3 hence the RGB equivalent ;) ) |
Dear @mikerabat , The reply was given by Claude (not me - although Claude signed as me). In my opinion, my first attempt would be starting with 3x3 kernels and let the NN to learn the features in deeper layers. Each time that you do a stride, you also double the receptive field of the neurons in deeper layers. I would expect the last layers of the NN to learn humanly meaningful concepts only. |
Ok this makes sense... Thank you very much... I now played around a bit to start the auto encoder learning process:
NumFeatures = 1024 (around 4 seconds of ecg), y is 1 since I have 1 dim data and 3 ecg channels (numChanPerRow) Forgive my stupid question here but Maybe my problem is the understanding of the upsample class... As far as I can see the upsample class assumes that x and y dimensions of the input space are the same and it doubles in the y direction as well.... |
I would change this:
to this:
I would multiply the number of input channels by 4 before each |
Unfortunately not... my problem is tha the input dimensions are 1024x1x3 (aka 4sec of ecg x 1 dim input x 3 channels of ecg)
actually is: 1024 x 6 x 8 instead of the 1024 x 1 x 3 as anticipated.... I found a resize layer but that I think only works if the product of all dimension would be the same like but 1024 x 6 x 8 cannot be resized to 1024 x 1 x 3 since that would lose some data points right? I also tried a fully connected linear layer but that actually would result in a quite large parameter space.... |
AH! I now understand what you say. I see: the padding is making the second dimension to grow. In the encoder side, what you can do is:
For the decoder side, I don't have 1D upsampler. BUT, you could do something like this:
Do you think that I it could work? |
In this message, I'm not suggesting anything. I'm just bringing to attention existing layers. Via transposing outputs and running pointwise convolutions, you can transform a 2D output into a cube. Assume that your input (1024, 1, 3) could be transposed into (1024, 3, 1) via Another super crazy idea would be calling (still in beta version / not fully tested) Another idea: if you get "overflows" while training, you can call This is just a brainstorm. I have never experimented an autoencoder with transposes... But, resizing the 1024,1,3 into 32,32,3 could work... |
omg.. thank you for your valuable input :) |
A colleague of mine made the comment I should pretraing the models to yield more robust models and
better accuracy. Now... how can I do that - or ... what are possible avenues here?
My models are all based on ecg (1 dim, up to 3 channels which I basically encode as "RGB" ).
He hinted that one should first train the dataset differently... basically use a few convolutional layers to
"compress" the signal and then "expand" the layers again. The pretraining goal is to "reconstruct" the input signal (am I right here???)
so input is output. (I know that this can be done in an old fashioned 3 layer NN approach which yields to the PCA - is that here the same?)
After pretraining - cut off some of the layers (how many is good?) - add new ones so the real classification task can be achieved and train again with the real classification task.
Is this something reasonable? And is there maybe an example out there that shows an efficent way on how to do that?
The text was updated successfully, but these errors were encountered: