What it does
Given a pre-built keras deep learning model and labeled training dataset, this tool works in two modes.
- Train and Validate: the intput dataset is split into training and validation portions. The model is fitted on the training portion, in the meantime performances are evaluated on the validation portion multiple times while the training is progressing. Finally, a fitted model and its validation performance scores are outputted.
- Train, Validate and and Evaluate: the input dataset is split into three portions, training, validation and testing. The same Train and Validate described above is performed on the training and validation portions. The testing portion is used exclusively for testing (evaluation). As a result, a fitted model and test performance scores are outputted.
In both modes, besides the performance scores, the true labels and predicted values are outputted, which could be used in generating plots in other tools, machine learning visualization extensions, for example.
Note that since all training and model parameters are accessible and changeable in the Hyperparameter Swapping section, the training and evaluation processes are flexible and transparent.
For metrics, there are two sets of metrics for deep learning training and evaluation, one from the keras model builder and the other from scikit-learn. Keras metrics, if selected, are always evaluated, while the sklearn metrics could be ignored when default is the selection. Please be aware that not every sklearn metric works with deep learning model at current moment. Feel free to file a ticket if an issue is found and contibuting with PRs is always welcomed.
Input
- tabular
- sparse
- sequences in a fasta file to work with DNA, RNA and proteins with corresponding fasta data generator
- reference genome and intervals exclusively work with GenomicIntervalBatchGenerator.
Output
- performance scores from evaluation
- fitted estimator
- true labels or values and predicted values from the evaluation