[FIXED] Should I retrain the entire model with additional data?

Issue

I trained a model with the default training data. However, when testing on examples from production, the testing accuracy is very low. Do I have to add new data from production? Is it necessary to train the entire model again? Should I use all new data or only some of it? I have tried using all the new data. This is what my code looks like:

    # In[]: Retrain the module(s) with the new data: Strategy 1: All old+new data
     model=load_model(‘model.h5’)
    (x_train_old, y_train_old)=dataset.load_old_data()
    (x_train_new, y_train_new)=dataset.load_new_data()
    x_train=x_train_old+x_train_new
    y_train=y_train_old+y_train_new

    model.fit(x_train,y_train, epoch=nb_epoch,batch_size=128)
    
    # In[]: Retrain the module(s) with the new data: Strategy 2: Only new data
    model=load_model(‘model.h5’)
    (x_train_new, y_train_new)=dataset.load_new_data()

    model.fit(x_train_new,y_train_new, epoch=nb_epoch,batch_size=128)
    

Here, model.h5 is the already trained module.

Solution

Yes, your should retrain then model with the new data. This is known as MLops in which you closed check the model accuracy after it is gone on production and it the model accuracy is decreasing with the surge of new data. you retrain the model. so do retrain them model on new data.

Answered By – Naman

Answer Checked By – Marilyn (Easybugfix Volunteer)

Leave a Reply

(*) Required, Your email will not be published