This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
What does it mean to have an imbalanced dataset?
The number of samples is smaller or greater than is required.
The feature columns contain many missing values.
There are many more training examples that correspond to some outputs (categories) than others.
What information can we extract from a single confusion matrix?
Log loss, and/or mean squared error.
Whether the dataset overfitted the training set.
What kind of mistakes the model is making.
Why are measures like True Positives or Accuracy not used to train our models directly?
There are mathematical barriers that prevent these measures from being used for some training regimens.
Subtle model improvements often don't affect these metrics.
Both of these answers.
You must answer all questions before checking your work.
Was this page helpful?