What does Underfitting Mean?

Underfitting, the counterpart of overfitting, happens when a machine learning model is not complex enough to accurately capture relationships between a dataset’s features and a target variable. An underfitted model results in problematic or erroneous outcomes on new data, or data that it wasn’t trained on, and often performs poorly even on training data.

Here is a graphical representation of underfitting:


A simple straight line is a decent representation of the training data, but it doesn’t fully render the underlying curved relationship between the variables x and y. Therefore, the model’s outcomes will not be accurate when you apply it to new data, especially when x values in the new data are much larger or smaller than those in the training data.

Why is Underfitting Important?

Using underfitted models for decision-making could be costly for businesses. For example, an underfitted model may suggest that you can always make better sales by spending more on marketing when in fact the model fails to capture a saturation effect (at some point, sales will flatten out no matter how much more you spend on marketing). If your business is relying on that model to determine your marketing budget, you will overspend on marketing.

DataRobot + Underfitting

One of the most effective ways to avoid underfitting is to ensure that your models are sufficiently complex, which you can accomplish by adding features or changing the data preprocessing steps. The DataRobot AI platform automatically performs advanced feature engineering, implements best practices for data preprocessing, and builds dozens of complex machine learning models that are the most appropriate for your dataset and target feature. By incorporating the expertise of top-ranked data scientists into our platform, DataRobot automates the process of ensuring your model is appropriately fitted so that you can focus on picking the most relevant model without questioning its practical accuracy.