Inference

Running Inference means running an algorithm after training is complete. During training the algorithm makes a prediction and is updated based on feedback comparing the predicted value to the correct one.

For a Language Model the training task is to predict a unit of text based on the context - either the preceding or the surrounding text. Through inference the LM can generate new text based on a supplied input.