fixing classification_with_grn_and_vsn various errors #2010
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Dataset preparation errors
The example file from structured_data
classification_with_grn_and vsn.py
I think it is using the wrong dataset, i.e., the data_url:https://archive.ics.uci.edu/static/public/20/census+income.zip
leads to a download of an incorrect dataset. The correct data_url, I believe should be:https://archive.ics.uci.edu/static/public/117/census+income+kdd.zip
, a fix has been added.To extract the downloaded
.tar.gz file
, created during a call tokeras.utils.get_file
, a fix has been added.A fix was also added to clean up the directory that the files where extracted to during download in order to run the script again without errors:
Additionally, the original script has the code snippet:
The above snippet doesn't account for the directory created during
keras.utils.get_file
extraction processcensus+income+kdd.zip
which leads to an incorrect path for bothtrain_data_path
andtest_data_path
, and a fix has been added.Additional training errors
After covering the above dataset's preparation process, the script also has an additional error encountered during model training, detailed below and an attempted solution provided:
Attempted solution:
I believe I have precisely traced the error to the following, here is a pdb script:
The function _convert_inputs_to_tensors creates a
zip iterator
pairing togetherflat_inputs
andself._inputs
, and as per thepdb
output above the first element (age) from flat_inputs and self._inputs hasfloat32
dtype, however the second element (capital_gains) has afloat32
dtype and astring
dtype causing the discrepancy, and hence the error.I think the main issue is that the
csv
file used to create thedataset
has columns arranged in a different order compared to modelinputs
. Model inputs have structured order, in that they are arranged from numerical to categorical features.I've tried to rearrange the dataset dataframe to match the columns before creating the train and test csv files, but somehow pandas is merely renaming the columns without actually shifting the columns.
For more information related to original script
The original script had the following error which I didn't attend to since it has other issues before training.
Environment