Scikit-learn inspired model finetuning for natural language processing.
Finetune is a library that allows users to leverage state-of-the-art pretrained NLP models for a wide variety of downstream tasks.
Finetune currently supports TensorFlow implementations of the following models:
- BERT, from BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
- RoBERTa, from RoBERTa: A Robustly Optimized BERT Pretraining Approach.
- GPT, from Improving Language Understanding by Generative Pre-Training.
- GPT2, from Language Models are Unsupervised Multitask Learners.
- TextCNN, from Convolutional Neural Networks for Sentence Classification.
- Temporal Convolution Network, from An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling.
- DistilBERT, from Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT.
Source code for finetune
is available on github.
Finetune Quickstart Guide¶
Finetuning the base language model is as easy as calling Classifier.fit()
:
model = Classifier() # Load base model
model.fit(trainX, trainY) # Finetune base model on custom data
predictions = model.predict(testX) # [{'class_1': 0.23, 'class_2': 0.54, ..}, ..]
model.save(path) # Serialize the model to disk
Reload saved models from disk by using Classifier.load()
:
model = Classifier.load(path)
predictions = model.predict(testX)
Installation¶
Finetune can be installed directly from PyPI by using pip
pip install finetune
or installed directly from source:
git clone https://github.com/IndicoDataSolutions/finetune
cd finetune
python3 setup.py develop
python3 -m spacy download en
Code Examples¶
For example usage of provided models, see the finetune/datasets directory.