Dataset LoadingΒΆ

Finetune supports providing input data as a list or as a data generator. When a generator is provided as input, finetune takes advantage of the tf.data module for data pipelining.

Providing text and targets in list format:

X = ['german shepherd', 'maine coon', 'persian', 'beagle']
Y = ['dog', 'cat', 'cat', 'dog']
model = Classifier()
model.fit(X, Y)

Providing data as a generator:

df = pd.read_csv('pets.csv')

# Even if raw data is greedily loaded,
# using a generator allows us to defer data preprocessing
def text_generator():
    for row in df.Text.values:
        yield row.Text

# dataset_size must be specified if input is provided as generators
model = Classifier(dataset_size=len(df))
model.fit(text_generator)