Skip to content

Train and deploy a SOTA model#

Let's see great-ai in action by going over the lifecycle of a simple service.


  1. You will see how the great_ai.utilities can integrate into your Data Science workflow.
  2. You will use great_ai.large_file to version and store your trained model.
  3. You will use GreatAI to prepare your model for a robust and responsible deployment.


You will train a field of study (domain) classifier for scientific sentences. The exact task was proposed by the SciBERT paper in which SciBERT achieved an F1-score of 0.6571. We are going to outperform it using a trivial text classification model: a Linear SVM.

We use the same synthetic dataset derived from the Microsoft Academic Graph. The dataset is available here.


You are ready to start the tutorial. Feel free to return to the summary section once you're finished.


Training notebook#

We load and preprocess the dataset while relying on great_ai.utilities.clean for doing the heavy-lifting. Additionally, the preprocessing is parallelised using great_ai.utilities.simple_parallel_map

After training and evaluating a model, it is exported using great_ai.save_model.

Remote storage

To store your model remotely, you must set your credentials before calling save_model.

For example, to use AWS S3:

from great_ai.large_file import LargeFileS3


from great_ai import save_model

save_model(model, key='my-domain-predictor')

For more info, checkout the configuration how-to page.

Deployment notebook#

We create an inference function that can be hardened by wrapping it in a GreatAI instance.

from great_ai import GreatAI, use_model
from great_ai.utilities import clean

@use_model('my-domain-predictor')   #(1)
def predict_domain(sentence, model):
    inputs = [clean(sentence)]
    return str(model.predict(inputs)[0])
  1. @use_model loads and injects your model into the predict_domain function's model argument. You can freely reference it, knowing that the function is always provided with it.

Finally, we test the model's inference function through the GreatAI dashboard. The only thing left is to deploy the hardened service properly.

Last update: August 20, 2022