https://i.imgur.com/kYL058E.png

Scikit-learn inspired model finetuning for natural language processing.

finetune ships with a pre-trained language model from “Improving Language Understanding by Generative Pre-Training” and builds off the OpenAI/finetune-language-model repository.

Source code for finetune is available on github.

Finetune Quickstart Guide

Finetuning the base language model is as easy as calling Classifier.fit():

model = Classifier()               # Load base model
model.fit(trainX, trainY)          # Finetune base model on custom data
predictions = model.predict(testX) # [{'class_1': 0.23, 'class_2': 0.54, ..}, ..]
model.save(path)                   # Serialize the model to disk

Reload saved models from disk by using Classifier.load():

model = Classifier.load(path)
predictions = model.predict(testX)

Installation

Finetune can be installed directly from PyPI by using pip

pip install finetune

or installed directly from source:

git clone https://github.com/IndicoDataSolutions/finetune
cd finetune
python3 setup.py develop
python3 -m spacy download en

You can optionally run the provided test suite to ensure installation completed successfully.

pip3 install pytest
pytest

Docker

If you’d prefer you can also run finetune in a docker container. The bash scripts provided assume you have a functional install of docker and nvidia-docker.

./docker/build_docker.sh      # builds a docker image
./docker/start_docker.sh      # starts a docker container in the background
docker exec -it finetune bash # starts a bash session in the docker container

Dataset Loading

Finetune supports providing input data as a list or as a data generator. When a generator is provided as input, finetune takes advantage of the tf.data module for data pipelining

Providing text and targets in list format:

X = ['german shepherd', 'maine coon', 'persian', 'beagle']
Y = ['dog', 'cat', 'cat', 'dog']
model = Classifier()
model.fit(X, Y)

Providing data as a generator:

df = pd.read_csv('pets.csv')

# Even if raw data is greedily loaded,
# using a generator allows us to defer data preprocessing
def text_generator():
    for row in df.Text.values:
        yield row.Text

# dataset_size must be specified if input is provided as generators
model = Classifier(dataset_size=len(df))
model.fit(text_generator)

Code Examples

For example usage of provided models, see the finetune/datasets directory.

Finetune API Reference

class finetune.Classifier(config=None, **kwargs)[source]

Classifies a single document into 1 of N categories.

Parameters:
  • config – A finetune.config.Settings object or None (for default config).
  • **kwargs – key-value pairs of config items to override.
featurize(X)[source]

Embeds inputs in learned feature space. Can be called before or after calling finetune().

Parameters:X – list or array of text to embed.
Returns:np.array of features of shape (n_examples, embedding_size).
finetune(X, Y=None, batch_size=None)[source]
Parameters:
  • X – list or array of text.
  • Y – integer or string-valued class labels.
  • batch_size – integer number of examples per batch. When N_GPUS > 1, this number corresponds to the number of training examples provided to each GPU.

Performs grid search over config items defined using “GridSearchable” objects and returns either full results or the config object that relates to the best results. The default config contains grid searchable objects for the most important parameters to search over.

Parameters:
  • Xs – Input text. Either [num_samples] or [sequence, num_samples] for single or multi input models respectively.
  • Y – Targets, A list of targets, [num_samples] that correspond to each sample in Xs.
  • test_size – Int or float. If an int is given this number of samples is used to validate, if a float is given then that fraction of samples is used.
  • config – A config object, or None to use the default config.
  • eval_fn – An eval function that takes 2 inputs (prediction, truth) and returns a float, with a max value being desired.
  • probs – If true, eval_fn is passed probability outputs from predict_proba, otherwise the output of predict is used.
  • return_all – If True, all results are returned, if False, only the best config is returned.
Returns:

default is to return the best config object. If return_all is true, it returns a list of tuples of the form [(config, eval_fn output), … ]

classmethod finetune_grid_search_cv(Xs, Y, *, n_splits, test_size, config=None, eval_fn=None, probs=False, return_all=False)

Performs cross validated grid search over config items defined using “GridSearchable” objects and returns either full results or the config object that relates to the best results. The default config contains grid searchable objects for the most important parameters to search over.

It should be noted that the cv splits are not guaranteed unique, but each split is given to each set of hparams.

Parameters:
  • Xs – Input text. Either [num_samples] or [sequence, num_samples] for single or multi input models respectively.
  • Y – Targets, A list of targets, [num_samples] that correspond to each sample in Xs.
  • n_splits – Number of CV splits to do.
  • test_size – Int or float. If an int is given this number of samples is used to validate, if a float is given then that fraction of samples is used.
  • config – A config object, or None to use the default config.
  • eval_fn – An eval function that takes 2 batches of outputs and returns a float, with a max value being desired. An arithmetic mean must make sense for this metric.
  • probs – If true, eval_fn is passed probability outputs from predict_proba, otherwise the output of predict is used.
  • return_all – If True, all results are returned, if False, only the best config is returned.
Returns:

default is to return the best config object. If return_all is true, it returns a list of tuples of the form [(config, eval_fn output), … ]

fit(*args, **kwargs)

An alias for finetune.

generate_text(seed_text='', max_length=None, use_extra_toks=True)

Performs a prediction on the Language modeling objective given some seed text. It uses a noisy greedy decoding. Temperature parameter for decoding is set in the config. :param max_length: The maximum length to decode to. :param seed_text: Defaults to the empty string. This will form the starting point to begin modelling :return: A string containing the generated text.

classmethod load(path)

Load a saved fine-tuned model from disk. Path provided should be a folder which contains .pkl and tf.Saver() files

Parameters:path – string path name to load model from. Same value as previously provided to save(). Must be a folder.
predict(X)[source]

Produces a list of most likely class labels as determined by the fine-tuned model.

Parameters:X – list or array of text to embed.
Returns:list of class labels.
predict_proba(X)[source]

Produces a probability distribution over classes for each example in X.

Parameters:X – list or array of text to embed.
Returns:list of dictionaries. Each dictionary maps from a class label to its assigned class probability.
save(path)

Saves the state of the model to disk to the folder specific by path. If path does not exist, it will be auto-created.

Save is performed in two steps:
  • Serialize tf graph to disk using tf.Saver
  • Serialize python model using pickle
Note:
Does not serialize state of Adam optimizer. Should not be used to save / restore a training model.
transform(*args, **kwargs)

An alias for featurize.

class finetune.Regressor(config=None, **kwargs)[source]

Regresses one or more floating point values given a single document.

For a full list of configuration options, see finetune.config.

Parameters:
  • config – A config object generated by finetune.config.get_config or None (for default config).
  • **kwargs – key-value pairs of config items to override.
featurize(X)[source]

Embeds inputs in learned feature space. Can be called before or after calling finetune().

Parameters:X – list or array of text to embed.
Returns:np.array of features of shape (n_examples, embedding_size).
finetune(X, Y=None, batch_size=None)[source]
Parameters:
  • X – list or array of text.
  • Y – floating point targets
  • batch_size – integer number of examples per batch. When N_GPUS > 1, this number corresponds to the number of training examples provided to each GPU.

Performs grid search over config items defined using “GridSearchable” objects and returns either full results or the config object that relates to the best results. The default config contains grid searchable objects for the most important parameters to search over.

Parameters:
  • Xs – Input text. Either [num_samples] or [sequence, num_samples] for single or multi input models respectively.
  • Y – Targets, A list of targets, [num_samples] that correspond to each sample in Xs.
  • test_size – Int or float. If an int is given this number of samples is used to validate, if a float is given then that fraction of samples is used.
  • config – A config object, or None to use the default config.
  • eval_fn – An eval function that takes 2 inputs (prediction, truth) and returns a float, with a max value being desired.
  • probs – If true, eval_fn is passed probability outputs from predict_proba, otherwise the output of predict is used.
  • return_all – If True, all results are returned, if False, only the best config is returned.
Returns:

default is to return the best config object. If return_all is true, it returns a list of tuples of the form [(config, eval_fn output), … ]

classmethod finetune_grid_search_cv(Xs, Y, *, n_splits, test_size, config=None, eval_fn=None, probs=False, return_all=False)

Performs cross validated grid search over config items defined using “GridSearchable” objects and returns either full results or the config object that relates to the best results. The default config contains grid searchable objects for the most important parameters to search over.

It should be noted that the cv splits are not guaranteed unique, but each split is given to each set of hparams.

Parameters:
  • Xs – Input text. Either [num_samples] or [sequence, num_samples] for single or multi input models respectively.
  • Y – Targets, A list of targets, [num_samples] that correspond to each sample in Xs.
  • n_splits – Number of CV splits to do.
  • test_size – Int or float. If an int is given this number of samples is used to validate, if a float is given then that fraction of samples is used.
  • config – A config object, or None to use the default config.
  • eval_fn – An eval function that takes 2 batches of outputs and returns a float, with a max value being desired. An arithmetic mean must make sense for this metric.
  • probs – If true, eval_fn is passed probability outputs from predict_proba, otherwise the output of predict is used.
  • return_all – If True, all results are returned, if False, only the best config is returned.
Returns:

default is to return the best config object. If return_all is true, it returns a list of tuples of the form [(config, eval_fn output), … ]

fit(*args, **kwargs)

An alias for finetune.

generate_text(seed_text='', max_length=None, use_extra_toks=True)

Performs a prediction on the Language modeling objective given some seed text. It uses a noisy greedy decoding. Temperature parameter for decoding is set in the config. :param max_length: The maximum length to decode to. :param seed_text: Defaults to the empty string. This will form the starting point to begin modelling :return: A string containing the generated text.

classmethod load(path)

Load a saved fine-tuned model from disk. Path provided should be a folder which contains .pkl and tf.Saver() files

Parameters:path – string path name to load model from. Same value as previously provided to save(). Must be a folder.
predict(X)[source]

Produces a list of most likely class labels as determined by the fine-tuned model.

Parameters:X – list or array of text to embed.
Returns:list of class labels.
predict_proba(X)[source]

Produces a probability distribution over classes for each example in X.

Parameters:X – list or array of text to embed.
Returns:list of dictionaries. Each dictionary maps from a class label to its assigned class probability.
save(path)

Saves the state of the model to disk to the folder specific by path. If path does not exist, it will be auto-created.

Save is performed in two steps:
  • Serialize tf graph to disk using tf.Saver
  • Serialize python model using pickle
Note:
Does not serialize state of Adam optimizer. Should not be used to save / restore a training model.
transform(*args, **kwargs)

An alias for featurize.

class finetune.MultifieldClassifier(config=None, **kwargs)[source]

Classifies a set of documents into 1 of N classes.

Parameters:
  • config – A finetune.config.Settings object or None (for default config).
  • **kwargs – key-value pairs of config items to override.
featurize(Xs)[source]

Embeds inputs in learned feature space. Can be called before or after calling finetune().

Parameters:*Xs – lists of text inputs, shape [batch, n_fields]
Returns:np.array of features of shape (n_examples, embedding_size).
finetune(Xs, Y=None, batch_size=None)[source]
Parameters:
  • *Xs – lists of text inputs, shape [batch, n_fields]
  • Y – integer or string-valued class labels. It is necessary for the items of Y to be sortable.
  • batch_size – integer number of examples per batch. When N_GPUS > 1, this number corresponds to the number of training examples provided to each GPU.

Performs grid search over config items defined using “GridSearchable” objects and returns either full results or the config object that relates to the best results. The default config contains grid searchable objects for the most important parameters to search over.

Parameters:
  • Xs – Input text. Either [num_samples] or [sequence, num_samples] for single or multi input models respectively.
  • Y – Targets, A list of targets, [num_samples] that correspond to each sample in Xs.
  • test_size – Int or float. If an int is given this number of samples is used to validate, if a float is given then that fraction of samples is used.
  • config – A config object, or None to use the default config.
  • eval_fn – An eval function that takes 2 inputs (prediction, truth) and returns a float, with a max value being desired.
  • probs – If true, eval_fn is passed probability outputs from predict_proba, otherwise the output of predict is used.
  • return_all – If True, all results are returned, if False, only the best config is returned.
Returns:

default is to return the best config object. If return_all is true, it returns a list of tuples of the form [(config, eval_fn output), … ]

classmethod finetune_grid_search_cv(Xs, Y, *, n_splits, test_size, config=None, eval_fn=None, probs=False, return_all=False)

Performs cross validated grid search over config items defined using “GridSearchable” objects and returns either full results or the config object that relates to the best results. The default config contains grid searchable objects for the most important parameters to search over.

It should be noted that the cv splits are not guaranteed unique, but each split is given to each set of hparams.

Parameters:
  • Xs – Input text. Either [num_samples] or [sequence, num_samples] for single or multi input models respectively.
  • Y – Targets, A list of targets, [num_samples] that correspond to each sample in Xs.
  • n_splits – Number of CV splits to do.
  • test_size – Int or float. If an int is given this number of samples is used to validate, if a float is given then that fraction of samples is used.
  • config – A config object, or None to use the default config.
  • eval_fn – An eval function that takes 2 batches of outputs and returns a float, with a max value being desired. An arithmetic mean must make sense for this metric.
  • probs – If true, eval_fn is passed probability outputs from predict_proba, otherwise the output of predict is used.
  • return_all – If True, all results are returned, if False, only the best config is returned.
Returns:

default is to return the best config object. If return_all is true, it returns a list of tuples of the form [(config, eval_fn output), … ]

fit(*args, **kwargs)

An alias for finetune.

generate_text(seed_text='', max_length=None, use_extra_toks=True)

Performs a prediction on the Language modeling objective given some seed text. It uses a noisy greedy decoding. Temperature parameter for decoding is set in the config. :param max_length: The maximum length to decode to. :param seed_text: Defaults to the empty string. This will form the starting point to begin modelling :return: A string containing the generated text.

classmethod load(path)

Load a saved fine-tuned model from disk. Path provided should be a folder which contains .pkl and tf.Saver() files

Parameters:path – string path name to load model from. Same value as previously provided to save(). Must be a folder.
predict(Xs)[source]

Produces list of most likely class labels as determined by the fine-tuned model.

Parameters:*Xs – lists of text inputs, shape [batch, n_fields]
Returns:list of class labels.
predict_proba(Xs)[source]

Produces probability distribution over classes for each example in X.

Parameters:*Xs – lists of text inputs, shape [batch, n_fields]
Returns:list of dictionaries. Each dictionary maps from X2 class label to its assigned class probability.
save(path)

Saves the state of the model to disk to the folder specific by path. If path does not exist, it will be auto-created.

Save is performed in two steps:
  • Serialize tf graph to disk using tf.Saver
  • Serialize python model using pickle
Note:
Does not serialize state of Adam optimizer. Should not be used to save / restore a training model.
transform(*args, **kwargs)

An alias for featurize.

class finetune.MultifieldRegressor(config=None, **kwargs)[source]

Regresses one or more floating point values given a set of documents per example.

Parameters:
  • config – A finetune.config.Settings object or None (for default config).
  • **kwargs – key-value pairs of config items to override.
featurize(Xs)[source]

Embeds inputs in learned feature space. Can be called before or after calling finetune().

Parameters:*Xs – lists of text inputs, shape [batch, n_fields]
Returns:np.array of features of shape (n_examples, embedding_size).
finetune(Xs, Y=None, batch_size=None)[source]
Parameters:
  • *Xs – lists of text inputs, shape [batch, n_fields]
  • Y – floating point targets
  • batch_size – integer number of examples per batch. When N_GPUS > 1, this number corresponds to the number of training examples provided to each GPU.

Performs grid search over config items defined using “GridSearchable” objects and returns either full results or the config object that relates to the best results. The default config contains grid searchable objects for the most important parameters to search over.

Parameters:
  • Xs – Input text. Either [num_samples] or [sequence, num_samples] for single or multi input models respectively.
  • Y – Targets, A list of targets, [num_samples] that correspond to each sample in Xs.
  • test_size – Int or float. If an int is given this number of samples is used to validate, if a float is given then that fraction of samples is used.
  • config – A config object, or None to use the default config.
  • eval_fn – An eval function that takes 2 inputs (prediction, truth) and returns a float, with a max value being desired.
  • probs – If true, eval_fn is passed probability outputs from predict_proba, otherwise the output of predict is used.
  • return_all – If True, all results are returned, if False, only the best config is returned.
Returns:

default is to return the best config object. If return_all is true, it returns a list of tuples of the form [(config, eval_fn output), … ]

classmethod finetune_grid_search_cv(Xs, Y, *, n_splits, test_size, config=None, eval_fn=None, probs=False, return_all=False)

Performs cross validated grid search over config items defined using “GridSearchable” objects and returns either full results or the config object that relates to the best results. The default config contains grid searchable objects for the most important parameters to search over.

It should be noted that the cv splits are not guaranteed unique, but each split is given to each set of hparams.

Parameters:
  • Xs – Input text. Either [num_samples] or [sequence, num_samples] for single or multi input models respectively.
  • Y – Targets, A list of targets, [num_samples] that correspond to each sample in Xs.
  • n_splits – Number of CV splits to do.
  • test_size – Int or float. If an int is given this number of samples is used to validate, if a float is given then that fraction of samples is used.
  • config – A config object, or None to use the default config.
  • eval_fn – An eval function that takes 2 batches of outputs and returns a float, with a max value being desired. An arithmetic mean must make sense for this metric.
  • probs – If true, eval_fn is passed probability outputs from predict_proba, otherwise the output of predict is used.
  • return_all – If True, all results are returned, if False, only the best config is returned.
Returns:

default is to return the best config object. If return_all is true, it returns a list of tuples of the form [(config, eval_fn output), … ]

fit(*args, **kwargs)

An alias for finetune.

generate_text(seed_text='', max_length=None, use_extra_toks=True)

Performs a prediction on the Language modeling objective given some seed text. It uses a noisy greedy decoding. Temperature parameter for decoding is set in the config. :param max_length: The maximum length to decode to. :param seed_text: Defaults to the empty string. This will form the starting point to begin modelling :return: A string containing the generated text.

classmethod load(path)

Load a saved fine-tuned model from disk. Path provided should be a folder which contains .pkl and tf.Saver() files

Parameters:path – string path name to load model from. Same value as previously provided to save(). Must be a folder.
predict(Xs)[source]

Produces list of most likely class labels as determined by the fine-tuned model.

Parameters:*Xs – lists of text inputs, shape [batch, n_fields]
Returns:list of class labels.
predict_proba(Xs)[source]

Produces probability distribution over classes for each example in X.

Parameters:*Xs – lists of text inputs, shape [batch, n_fields]
Returns:list of dictionaries. Each dictionary maps from X2 class label to its assigned class probability.
save(path)

Saves the state of the model to disk to the folder specific by path. If path does not exist, it will be auto-created.

Save is performed in two steps:
  • Serialize tf graph to disk using tf.Saver
  • Serialize python model using pickle
Note:
Does not serialize state of Adam optimizer. Should not be used to save / restore a training model.
transform(*args, **kwargs)

An alias for featurize.

class finetune.MultiLabelClassifier(*args, **kwargs)[source]

Classifies a single document into upto N of N categories.

Parameters:
  • config – A finetune.config.Settings object or None (for default config).
  • **kwargs – key-value pairs of config items to override.
featurize(X)[source]

Embeds inputs in learned feature space. Can be called before or after calling finetune().

Parameters:X – list or array of text to embed.
Returns:np.array of features of shape (n_examples, embedding_size).
finetune(X, Y=None, batch_size=None)[source]
Parameters:
  • X – list or array of text.
  • Y – A list of lists containing labels for the corresponding X
  • batch_size – integer number of examples per batch. When N_GPUS > 1, this number corresponds to the number of training examples provided to each GPU.

Performs grid search over config items defined using “GridSearchable” objects and returns either full results or the config object that relates to the best results. The default config contains grid searchable objects for the most important parameters to search over.

Parameters:
  • Xs – Input text. Either [num_samples] or [sequence, num_samples] for single or multi input models respectively.
  • Y – Targets, A list of targets, [num_samples] that correspond to each sample in Xs.
  • test_size – Int or float. If an int is given this number of samples is used to validate, if a float is given then that fraction of samples is used.
  • config – A config object, or None to use the default config.
  • eval_fn – An eval function that takes 2 inputs (prediction, truth) and returns a float, with a max value being desired.
  • probs – If true, eval_fn is passed probability outputs from predict_proba, otherwise the output of predict is used.
  • return_all – If True, all results are returned, if False, only the best config is returned.
Returns:

default is to return the best config object. If return_all is true, it returns a list of tuples of the form [(config, eval_fn output), … ]

classmethod finetune_grid_search_cv(Xs, Y, *, n_splits, test_size, config=None, eval_fn=None, probs=False, return_all=False)

Performs cross validated grid search over config items defined using “GridSearchable” objects and returns either full results or the config object that relates to the best results. The default config contains grid searchable objects for the most important parameters to search over.

It should be noted that the cv splits are not guaranteed unique, but each split is given to each set of hparams.

Parameters:
  • Xs – Input text. Either [num_samples] or [sequence, num_samples] for single or multi input models respectively.
  • Y – Targets, A list of targets, [num_samples] that correspond to each sample in Xs.
  • n_splits – Number of CV splits to do.
  • test_size – Int or float. If an int is given this number of samples is used to validate, if a float is given then that fraction of samples is used.
  • config – A config object, or None to use the default config.
  • eval_fn – An eval function that takes 2 batches of outputs and returns a float, with a max value being desired. An arithmetic mean must make sense for this metric.
  • probs – If true, eval_fn is passed probability outputs from predict_proba, otherwise the output of predict is used.
  • return_all – If True, all results are returned, if False, only the best config is returned.
Returns:

default is to return the best config object. If return_all is true, it returns a list of tuples of the form [(config, eval_fn output), … ]

fit(*args, **kwargs)

An alias for finetune.

generate_text(seed_text='', max_length=None, use_extra_toks=True)

Performs a prediction on the Language modeling objective given some seed text. It uses a noisy greedy decoding. Temperature parameter for decoding is set in the config. :param max_length: The maximum length to decode to. :param seed_text: Defaults to the empty string. This will form the starting point to begin modelling :return: A string containing the generated text.

classmethod load(path)

Load a saved fine-tuned model from disk. Path provided should be a folder which contains .pkl and tf.Saver() files

Parameters:path – string path name to load model from. Same value as previously provided to save(). Must be a folder.
predict(X, threshold=None)[source]

Produces a list of most likely class labels as determined by the fine-tuned model.

Parameters:X – list or array of text to embed.
Returns:list of class labels.
predict_proba(X)[source]

Produces a probability distribution over classes for each example in X.

Parameters:X – list or array of text to embed.
Returns:list of dictionaries. Each dictionary maps from a class label to its assigned class probability.
save(path)

Saves the state of the model to disk to the folder specific by path. If path does not exist, it will be auto-created.

Save is performed in two steps:
  • Serialize tf graph to disk using tf.Saver
  • Serialize python model using pickle
Note:
Does not serialize state of Adam optimizer. Should not be used to save / restore a training model.
transform(*args, **kwargs)

An alias for featurize.

class finetune.SequenceLabeler(config=None, **kwargs)[source]

Labels each token in a sequence as belonging to 1 of N token classes.

Parameters:
  • config – A finetune.config.Settings object or None (for default config).
  • **kwargs – key-value pairs of config items to override.
featurize(X)[source]

Embeds inputs in learned feature space. Can be called before or after calling finetune().

Parameters:Xs – An iterable of lists or array of text, shape [batch, n_inputs, tokens]
Returns:np.array of features of shape (n_examples, embedding_size).

Performs grid search over config items defined using “GridSearchable” objects and returns either full results or the config object that relates to the best results. The default config contains grid searchable objects for the most important parameters to search over.

Parameters:
  • Xs – Input text. Either [num_samples] or [sequence, num_samples] for single or multi input models respectively.
  • Y – Targets, A list of targets, [num_samples] that correspond to each sample in Xs.
  • test_size – Int or float. If an int is given this number of samples is used to validate, if a float is given then that fraction of samples is used.
  • config – A config object, or None to use the default config.
  • eval_fn – An eval function that takes 2 inputs (prediction, truth) and returns a float, with a max value being desired.
  • probs – If true, eval_fn is passed probability outputs from predict_proba, otherwise the output of predict is used.
  • return_all – If True, all results are returned, if False, only the best config is returned.
Returns:

default is to return the best config object. If return_all is true, it returns a list of tuples of the form [(config, eval_fn output), … ]

classmethod finetune_grid_search_cv(Xs, Y, *, n_splits, test_size, config=None, eval_fn=None, probs=False, return_all=False)

Performs cross validated grid search over config items defined using “GridSearchable” objects and returns either full results or the config object that relates to the best results. The default config contains grid searchable objects for the most important parameters to search over.

It should be noted that the cv splits are not guaranteed unique, but each split is given to each set of hparams.

Parameters:
  • Xs – Input text. Either [num_samples] or [sequence, num_samples] for single or multi input models respectively.
  • Y – Targets, A list of targets, [num_samples] that correspond to each sample in Xs.
  • n_splits – Number of CV splits to do.
  • test_size – Int or float. If an int is given this number of samples is used to validate, if a float is given then that fraction of samples is used.
  • config – A config object, or None to use the default config.
  • eval_fn – An eval function that takes 2 batches of outputs and returns a float, with a max value being desired. An arithmetic mean must make sense for this metric.
  • probs – If true, eval_fn is passed probability outputs from predict_proba, otherwise the output of predict is used.
  • return_all – If True, all results are returned, if False, only the best config is returned.
Returns:

default is to return the best config object. If return_all is true, it returns a list of tuples of the form [(config, eval_fn output), … ]

fit(*args, **kwargs)

An alias for finetune.

generate_text(seed_text='', max_length=None, use_extra_toks=True)

Performs a prediction on the Language modeling objective given some seed text. It uses a noisy greedy decoding. Temperature parameter for decoding is set in the config. :param max_length: The maximum length to decode to. :param seed_text: Defaults to the empty string. This will form the starting point to begin modelling :return: A string containing the generated text.

classmethod load(path)

Load a saved fine-tuned model from disk. Path provided should be a folder which contains .pkl and tf.Saver() files

Parameters:path – string path name to load model from. Same value as previously provided to save(). Must be a folder.
predict(X)[source]

Produces a list of most likely class labels as determined by the fine-tuned model.

Parameters:X – A list / array of text, shape [batch]
Returns:list of class labels.
predict_proba(X)[source]

Produces a list of most likely class labels as determined by the fine-tuned model.

Parameters:X – A list / array of text, shape [batch]
Returns:list of class labels.
save(path)

Saves the state of the model to disk to the folder specific by path. If path does not exist, it will be auto-created.

Save is performed in two steps:
  • Serialize tf graph to disk using tf.Saver
  • Serialize python model using pickle
Note:
Does not serialize state of Adam optimizer. Should not be used to save / restore a training model.
transform(*args, **kwargs)

An alias for featurize.

class finetune.Comparison(*args, **kwargs)[source]

Compares two documents to solve a classification task.

Parameters:
  • config – A finetune.config.Settings object or None (for default config).
  • **kwargs – key-value pairs of config items to override.
featurize(pairs)[source]

Embeds inputs in learned feature space. Can be called before or after calling finetune().

Parameters:pairs – Array of text, shape [batch, 2]
Returns:np.array of features of shape (n_examples, embedding_size).
finetune(X, Y=None, batch_size=None)
Parameters:
  • X – list or array of text.
  • Y – integer or string-valued class labels.
  • batch_size – integer number of examples per batch. When N_GPUS > 1, this number corresponds to the number of training examples provided to each GPU.

Performs grid search over config items defined using “GridSearchable” objects and returns either full results or the config object that relates to the best results. The default config contains grid searchable objects for the most important parameters to search over.

Parameters:
  • Xs – Input text. Either [num_samples] or [sequence, num_samples] for single or multi input models respectively.
  • Y – Targets, A list of targets, [num_samples] that correspond to each sample in Xs.
  • test_size – Int or float. If an int is given this number of samples is used to validate, if a float is given then that fraction of samples is used.
  • config – A config object, or None to use the default config.
  • eval_fn – An eval function that takes 2 inputs (prediction, truth) and returns a float, with a max value being desired.
  • probs – If true, eval_fn is passed probability outputs from predict_proba, otherwise the output of predict is used.
  • return_all – If True, all results are returned, if False, only the best config is returned.
Returns:

default is to return the best config object. If return_all is true, it returns a list of tuples of the form [(config, eval_fn output), … ]

classmethod finetune_grid_search_cv(Xs, Y, *, n_splits, test_size, config=None, eval_fn=None, probs=False, return_all=False)

Performs cross validated grid search over config items defined using “GridSearchable” objects and returns either full results or the config object that relates to the best results. The default config contains grid searchable objects for the most important parameters to search over.

It should be noted that the cv splits are not guaranteed unique, but each split is given to each set of hparams.

Parameters:
  • Xs – Input text. Either [num_samples] or [sequence, num_samples] for single or multi input models respectively.
  • Y – Targets, A list of targets, [num_samples] that correspond to each sample in Xs.
  • n_splits – Number of CV splits to do.
  • test_size – Int or float. If an int is given this number of samples is used to validate, if a float is given then that fraction of samples is used.
  • config – A config object, or None to use the default config.
  • eval_fn – An eval function that takes 2 batches of outputs and returns a float, with a max value being desired. An arithmetic mean must make sense for this metric.
  • probs – If true, eval_fn is passed probability outputs from predict_proba, otherwise the output of predict is used.
  • return_all – If True, all results are returned, if False, only the best config is returned.
Returns:

default is to return the best config object. If return_all is true, it returns a list of tuples of the form [(config, eval_fn output), … ]

fit(*args, **kwargs)

An alias for finetune.

generate_text(seed_text='', max_length=None, use_extra_toks=True)

Performs a prediction on the Language modeling objective given some seed text. It uses a noisy greedy decoding. Temperature parameter for decoding is set in the config. :param max_length: The maximum length to decode to. :param seed_text: Defaults to the empty string. This will form the starting point to begin modelling :return: A string containing the generated text.

classmethod load(path)

Load a saved fine-tuned model from disk. Path provided should be a folder which contains .pkl and tf.Saver() files

Parameters:path – string path name to load model from. Same value as previously provided to save(). Must be a folder.
predict(pairs)[source]

Produces a list of most likely class labels as determined by the fine-tuned model.

Parameters:pairs – Array of text, shape [batch, 2]
Returns:list of class labels.
predict_proba(pairs)[source]

Produces a probability distribution over classes for each example in X.

Parameters:pairs – Array of text, shape [batch, 2]
Returns:list of dictionaries. Each dictionary maps from a class label to its assigned class probability.
save(path)

Saves the state of the model to disk to the folder specific by path. If path does not exist, it will be auto-created.

Save is performed in two steps:
  • Serialize tf graph to disk using tf.Saver
  • Serialize python model using pickle
Note:
Does not serialize state of Adam optimizer. Should not be used to save / restore a training model.
transform(*args, **kwargs)

An alias for featurize.

Finetune Model Configuration Options

class finetune.config.Settings(**kwargs)[source]

Model configuration options

Parameters:
  • batch_size – Number of examples per batch, defaults to 2.
  • visible_gpus – List of integer GPU ids to spread out computation across, defaults to all available GPUs.
  • n_epochs – Number of iterations through training data, defaults to 3.
  • random_seed – Random seed to use for repeatability purposes, defaults to 42.
  • max_length – Maximum number of subtokens per sequence. Examples longer than this number will be truncated (unless chunk_long_sequences=True for SequenceLabeler models). Defaults to 512.
  • weight_stddev – Standard deviation of initial weights. Defaults to 0.02.
  • chunk_long_sequences – When True, use a sliding window approach to predict on examples that are longer than max length. Defaults to False.
  • low_memory_mode – When True, only store partial gradients on forward pass and recompute remaining gradients incrementally in order to save memory. Defaults to False.
  • interpolate_pos_embed – Interpolate positional embeddings when max_length differs from it’s original value of 512. Defaults to False.
  • embed_p_drop – Embedding dropout probability. Defaults to 0.1.
  • attn_p_drop – Attention dropout probability. Defaults to 0.1.
  • resid_p_drop – Residual layer fully connected network dropout probability. Defaults to 0.1.
  • clf_p_drop – Classifier dropout probability. Defaults to 0.1.
  • l2_reg – L2 regularization coefficient. Defaults to 0.01.
  • b1 – Adam b1 parameter. Defaults to 0.9.
  • b2 – Adam b2 parameter. Defaults to 0.999.
  • epsilon – Adam epsilon parameter: Defaults to 1e-8.
  • lr_schedule – Learning rate schedule – see finetune/optimizers.py for more options.
  • lr – Learning rate. Defaults to 6.25e-5.
  • lr_warmup – Learning rate warmup (percentage of all batches to warmup for). Defaults to 0.002.
  • max_grad_norm – Clip gradients larger than this norm. Defaults to 1.0.
  • lm_loss_coef – Language modeling loss coefficient – a value between 0.0 - 1.0 that indicates how to trade off between language modeling loss and target model loss. Usually not beneficial to turn on unless dataset size exceeds a few thousand examples. Defaults to 0.0.
  • summarize_grads – Include gradient summary information in tensorboard. Defaults to False.
  • verbose – Print TQDM logs? Defaults to True.
  • val_size – Validation set size as a percentage of all training data. Validation will not be run by default if n_examples < 50. If n_examples > 50, defaults to max(5, min(100, 0.05 * n_examples))
  • val_interval – Evaluate on validation set after val_interval batches. Defaults to 4 * val_size / batch_size to ensure that too much time is not spent on validation.
  • lm_temp – Language model temperature – a value of 0.0 corresponds to greedy maximum likelihood predictions while a value of 1.0 corresponds to random predictions. Defaults to 0.2.
  • seq_num_heads – Number of attention heads of final attention layer. Defaults to 16.
  • subtoken_predictions – Return predictions at subtoken granularity or token granularity? Defaults to False.
  • multi_label_sequences – Use a multi-labeling approach to sequence labeling to allow overlapping labels.
  • multi_label_threshold – Threshold of sigmoid unit in multi label classifier. Can be increased or lowered to trade off precision / recall. Defaults to 0.5.
  • autosave_path – Save current best model (as measured by validation loss) to this location. Defaults to None.
  • tensorboard_folder – Directory for tensorboard logs. Tensorboard logs will not be written unless tensorboard_folder is explicitly provided. Defaults to None.
  • log_device_placement – Log which device each operation is placed on for debugging purposes. Defaults to False.
  • allow_soft_placement – Allow tf to allocate an operation to a different device if a device is unavailable. Defaults to True.
  • save_adam_vars – Save adam parameters when calling model.save(). Defaults to True.
  • num_layers_trained – How many layers to finetune. Specifying a value less than 12 will train layers starting from model output. Defaults to 12.
  • train_embeddings – Should embedding layer be finetuned? Defaults to True.
  • class_weights – One of ‘log’, ‘linear’, or ‘sqrt’. Auto-scales gradient updates based on class frequency. Can also be a dictionary that maps from true class name to loss coefficient. Defaults to None.
  • oversample – Should rare classes be oversampled? Defaults to False.
  • params_device – Which device should gradient updates be aggregated on? If you are using a single GPU and have more than 4Gb of GPU memory you should set this to GPU PCI number (0, 1, 2, etc.). Defaults to “cpu”.
  • eval_acc – if True, calculates accuracy and writes it to the tensorboard summary files for valudation runs.