[FIXED] sklearn girdsearchCV uses default parameters instead of param grid

Issue

I’m using a custom regressor for gridsearchCV but it is behaving strangely. It does the gridsearch with the default parameters instead of the given parameter grid, and then in the end runs it once with the parameter grid. I made a dummy example with mnist fashion (I know, not regression but it shows the problem) to demonstrate the problem (see code and output below).

As you can see in the output, the first two models that are used use the default parameters (one layer, no drop rate), even though it output the CV line ([CV 1/2]…) with the correct parameters… and if I print the self.drop_rate in the fit method it prints the correct drop_rate while the model clearly doesnt use it…

Code:

import tensorflow as tf
print("tf version: ", tf.__version__)
from sklearn.model_selection import GridSearchCV
from sklearn.base import BaseEstimator, RegressorMixin

fashion_mnist = tf.keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

def createNNModel(unit1, unit2, drop_rate,lr):
    if unit2==0:    
        if drop_rate==0:
            model = tf.keras.Sequential([
            tf.keras.layers.Flatten(input_shape=(28, 28)),
            tf.keras.layers.Dense(unit1, activation='relu'),
            tf.keras.layers.Dense(10)
            ])
        else:
            model = tf.keras.Sequential([
            tf.keras.layers.Flatten(input_shape=(28, 28)),
            tf.keras.layers.Dense(unit1, activation='relu'),
            tf.keras.layers.Dropout(drop_rate),
            tf.keras.layers.Dense(10)
            ])
    else:
        if drop_rate==0:
            model = tf.keras.Sequential([
            tf.keras.layers.Flatten(input_shape=(28, 28)),
            tf.keras.layers.Dense(unit1, activation='relu'),
            tf.keras.layers.Dense(unit2, activation='relu'),
            tf.keras.layers.Dense(10)
            ])
        else:
            model = tf.keras.Sequential([
            tf.keras.layers.Flatten(input_shape=(28, 28)),
            tf.keras.layers.Dense(unit1, activation='relu'),
            tf.keras.layers.Dropout(drop_rate),
            tf.keras.layers.Dense(unit2, activation='relu'),
            tf.keras.layers.Dropout(drop_rate),
            tf.keras.layers.Dense(10)
            ])
    
    model.compile(loss=tf.losses.MeanSquaredError(),
                optimizer=tf.optimizers.Adam(learning_rate=lr),
                metrics=[tf.metrics.MeanAbsoluteError()])    
    
    
    return model


class MyRegressor(BaseEstimator, RegressorMixin):  

    def __init__(self, unit1=32, unit2=0, drop_rate=0, lr=0.001):
        """
        Called when initializing the regressor
        """
        self.unit1=unit1
        self.unit2=unit2
        self.drop_rate=drop_rate
        self.lr=lr
        
        print("INIT DR:", self.drop_rate)
        
        self.model_=createNNModel(unit1, unit2, drop_rate,lr)

    def fit(self, X, y, max_epochs=100):
        """
        This should fit regressor. All the "work" should be done here.

        Note: assert is not a good choice here and you should rather
        use try/except blog with exceptions. This is just for short syntax.
        """
        
        print("FIT DR: ", self.drop_rate)
        
        self.history_ = self.model_.fit(X,y, epochs=max_epochs,
                      verbose=1)
        
        self.model_.summary()

        return self

    def predict(self, X, y=None):
        
               
        predictions = self.model_.predict(X)
        
        return predictions

    def score(self, X, y=None):
        
               
        performance = self.model_.evaluate(X) #mae
        
        return(1-performance[1])#the bigger the better


## TUNING
units1=[64]
units2=[64]
drop_outs=[0.8]
lrs=[0.01]
param_grid={'unit1': units1, 'unit2': units2, 'drop_rate': drop_outs, 'lr': lrs}
gs= GridSearchCV(MyRegressor(), param_grid, cv=2, verbose=3)
gs.fit(X=train_images, y=train_labels, max_epochs=2)

Output:

tf version:  2.9.0
INIT DR: 0
INIT DR: 0
Fitting 2 folds for each of 1 candidates, totalling 2 fits
INIT DR: 0
FIT DR:  0.8
Epoch 1/2
938/938 [==============================] - 1s 1ms/step - loss: 95.2783 - mean_absolute_error: 5.1844   
Epoch 2/2
938/938 [==============================] - 1s 1ms/step - loss: 21.4664 - mean_absolute_error: 3.7982
Model: "sequential_46"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten_8 (Flatten)         (None, 784)               0         
                                                                 
 dense_55 (Dense)            (None, 32)                25120     
                                                                 
 dense_56 (Dense)            (None, 10)                330       
                                                                 
=================================================================
Total params: 25,450
Trainable params: 25,450
Non-trainable params: 0
_________________________________________________________________
938/938 [==============================] - 1s 673us/step - loss: 0.0000e+00 - mean_absolute_error: 0.0000e+00
[CV 1/2] END drop_rate=0.8, lr=0.01, unit1=64, unit2=64;, score=1.000 total time=   3.1s
INIT DR: 0
FIT DR:  0.8
Epoch 1/2
938/938 [==============================] - 2s 1ms/step - loss: 60.8985 - mean_absolute_error: 4.7083  
Epoch 2/2
938/938 [==============================] - 1s 1ms/step - loss: 20.7136 - mean_absolute_error: 3.7330
Model: "sequential_47"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten_9 (Flatten)         (None, 784)               0         
                                                                 
 dense_57 (Dense)            (None, 32)                25120     
                                                                 
 dense_58 (Dense)            (None, 10)                330       
                                                                 
=================================================================
Total params: 25,450
Trainable params: 25,450
Non-trainable params: 0
_________________________________________________________________
938/938 [==============================] - 1s 679us/step - loss: 0.0000e+00 - mean_absolute_error: 0.0000e+00
[CV 2/2] END drop_rate=0.8, lr=0.01, unit1=64, unit2=64;, score=1.000 total time=   3.4s
INIT DR: 0
INIT DR: 0.8
FIT DR:  0.8
Epoch 1/2
1875/1875 [==============================] - 3s 2ms/step - loss: 731.5312 - mean_absolute_error: 3.8732    
Epoch 2/2
1875/1875 [==============================] - 3s 2ms/step - loss: 8.3729 - mean_absolute_error: 2.5103
Model: "sequential_49"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten_11 (Flatten)        (None, 784)               0         
                                                                 
 dense_61 (Dense)            (None, 64)                50240     
                                                                 
 dropout_8 (Dropout)         (None, 64)                0         
                                                                 
 dense_62 (Dense)            (None, 64)                4160      
                                                                 
 dropout_9 (Dropout)         (None, 64)                0         
                                                                 
 dense_63 (Dense)            (None, 10)                650       
                                                                 
=================================================================
Total params: 55,050
Trainable params: 55,050
Non-trainable params: 0
_________________________________________________________________

Solution

See this section of the sklearn developer’s guide: you shouldn’t set self.model_ in the __init__ method; putting that line into the fit method probably works for what you want.

The problem is that the grid search clones its estimator, and that operates by creating a new instance of the same class (without specifying any __init__ parameters!) and then setting its parameters with set_params. So by defining model_ in __init__, your clones all get the default parameters; then you set the parameters for your custom class, but they never make it through to the model_ object itself.

Answered By – Ben Reiniger

Answer Checked By – Katrina (Easybugfix Volunteer)

Leave a Reply

(*) Required, Your email will not be published