Model Configuration Options¶

class finetune.config.Settings(**kwargs)[source]¶

Model configuration options

Parameters:

batch_size – Number of examples per batch, defaults to 2.
visible_gpus – List of integer GPU ids to spread out computation across, defaults to all available GPUs.
n_epochs – Number of iterations through training data, defaults to 3.
random_seed – Random seed to use for repeatability purposes, defaults to 42.
max_length – Maximum number of subtokens per sequence. Examples longer than this number will be truncated (unless chunk_long_sequences=True for SequenceLabeler models). Defaults to 512.
weight_stddev – Standard deviation of initial weights. Defaults to 0.02.
chunk_long_sequences – When True, use a sliding window approach to predict on examples that are longer than max length. Defaults to False.
low_memory_mode – When True, only store partial gradients on forward pass and recompute remaining gradients incrementally in order to save memory. Defaults to False.
interpolate_pos_embed – Interpolate positional embeddings when max_length differs from it’s original value of 512. Defaults to False.
embed_p_drop – Embedding dropout probability. Defaults to 0.1.
attn_p_drop – Attention dropout probability. Defaults to 0.1.
resid_p_drop – Residual layer fully connected network dropout probability. Defaults to 0.1.
clf_p_drop – Classifier dropout probability. Defaults to 0.1.
l2_reg – L2 regularization coefficient. Defaults to 0.01.
vector_l2 – Whether to apply weight decay regularization to vectors (biases, normalization etc..). Defaults to False.
optimizer – Optimizer to use, current options include AdamW or AdamaxW.
b1 – Adam b1 parameter. Defaults to 0.9.
b2 – Adam b2 parameter. Defaults to 0.999.
epsilon – Adam epsilon parameter: Defaults to 1e-8.
lr_schedule – Learning rate schedule – see finetune/optimizers.py for more options.
lr – Learning rate. Defaults to 6.25e-5.
lr_warmup – Learning rate warmup (percentage of all batches to warmup for). Defaults to 0.002.
max_grad_norm – Clip gradients larger than this norm. Defaults to 1.0.
accum_steps – Number of updates to accumulate before applying. This is used to simulate a higher batch size.
lm_loss_coef – Language modeling loss coefficient – a value between 0.0 - 1.0 that indicates how to trade off between language modeling loss and target model loss. Usually not beneficial to turn on unless dataset size exceeds a few thousand examples. Defaults to 0.0.
summarize_grads – Include gradient summary information in tensorboard. Defaults to False.
val_size – Validation set size if int. Validation set size as percentage of all training data if float. Validation will not be run by default if n_examples < 50. If n_examples > 50, defaults to max(5, min(100, 0.05 * n_examples))
val_interval – Evaluate on validation set after val_interval batches. Defaults to 4 * val_size / batch_size to ensure that too much time is not spent on validation.
lm_temp – Language model temperature – a value of 0.0 corresponds to greedy maximum likelihood predictions while a value of 1.0 corresponds to random predictions. Defaults to 0.2.
seq_num_heads – Number of attention heads of final attention layer. Defaults to 16.
subtoken_predictions – Return predictions at subtoken granularity or token granularity? Defaults to False.
multi_label_sequences – Use a multi-labeling approach to sequence labeling to allow overlapping labels.
multi_label_threshold – Threshold of sigmoid unit in multi label classifier. Can be increased or lowered to trade off precision / recall. Defaults to 0.5.
autosave_path – Save current best model (as measured by validation loss) to this location. Defaults to None.
tensorboard_folder – Directory for tensorboard logs. Tensorboard logs will not be written unless tensorboard_folder is explicitly provided. Defaults to None.
log_device_placement – Log which device each operation is placed on for debugging purposes. Defaults to False.
allow_soft_placement – Allow tf to allocate an operation to a different device if a device is unavailable. Defaults to True.
save_adam_vars – Save adam parameters when calling model.save(). Defaults to True.
num_layers_trained – How many layers to finetune. Specifying a value less than 12 will train layers starting from model output. Defaults to 12.
train_embeddings – Should embedding layer be finetuned? Defaults to True.
class_weights – One of ‘log’, ‘linear’, or ‘sqrt’. Auto-scales gradient updates based on class frequency. Can also be a dictionary that maps from true class name to loss coefficient. Defaults to None.
oversample – Should rare classes be oversampled? Defaults to False.
params_device – Which device should gradient updates be aggregated on? If you are using a single GPU and have more than 4Gb of GPU memory you should set this to GPU PCI number (0, 1, 2, etc.). Defaults to “cpu”.
eval_acc – if True, calculates accuracy and writes it to the tensorboard summary files for valudation runs.
save_dtype – specifies what precision to save model weights with. Defaults to np.float32.
regression_loss – the loss to use for regression models. One of L1 or L2, defaults to L2.
prefit_init – if True, fit target model weigths before finetuning the entire model. Defaults to False.
debugging_logs – if True, output tensorflow logs and turn off TQDM logging. Defaults to False.
val_set – Where it is neccessary to use an explicit validation set, provide it here as a tuple (text, labels)
per_process_gpu_memory_fraction – fraction of the overall amount of memory that each visible GPU should be allocated, defaults to 1.0.
adapter_size – width of adapter module from ‘Parameter Efficient Transfer Learning’ paper, if defined. defaults to ‘None’.