https://i.imgur.com/kYL058E.png

Scikit-learn inspired model finetuning for natural language processing.

Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide variety of downstream tasks.

Finetune currently supports TensorFlow implementations of the following models:

  1. BERT, from BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
  2. RoBERTa, from RoBERTa: A Robustly Optimized BERT Pretraining Approach.
  3. GPT, from Improving Language Understanding by Generative Pre-Training.
  4. GPT2, from Language Models are Unsupervised Multitask Learners.
  5. TextCNN, from Convolutional Neural Networks for Sentence Classification.
  6. Temporal Convolution Network, from An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling.
  7. DistilBERT, from Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT.

Source code for finetune is available on github.

Finetune Quickstart Guide

Finetuning the base language model is as easy as calling Classifier.fit():

model = Classifier()               # Load base model
model.fit(trainX, trainY)          # Finetune base model on custom data
predictions = model.predict(testX) # [{'class_1': 0.23, 'class_2': 0.54, ..}, ..]
model.save(path)                   # Serialize the model to disk

Reload saved models from disk by using Classifier.load():

model = Classifier.load(path)
predictions = model.predict(testX)

Installation

Finetune can be installed directly from PyPI by using pip

pip install finetune

or installed directly from source:

git clone https://github.com/IndicoDataSolutions/finetune
cd finetune
python3 setup.py develop
python3 -m spacy download en

Code Examples

For example usage of provided models, see the finetune/datasets directory.