Model Configuration Options¶

class
finetune.config.
Settings
(**kwargs)[source]¶ Model configuration options
Parameters:  batch_size – Number of examples per batch, defaults to 2.
 visible_gpus – List of integer GPU ids to spread out computation across, defaults to all available GPUs.
 n_epochs – Number of iterations through training data, defaults to 3.
 random_seed – Random seed to use for repeatability purposes, defaults to 42.
 max_length – Maximum number of subtokens per sequence. Examples longer than this number will be truncated (unless chunk_long_sequences=True for SequenceLabeler models). Defaults to 512.
 weight_stddev – Standard deviation of initial weights. Defaults to 0.02.
 chunk_long_sequences – When True, use a sliding window approach to predict on examples that are longer than max length. Defaults to False.
 low_memory_mode – When True, only store partial gradients on forward pass and recompute remaining gradients incrementally in order to save memory. Defaults to False.
 interpolate_pos_embed – Interpolate positional embeddings when max_length differs from it’s original value of 512. Defaults to False.
 embed_p_drop – Embedding dropout probability. Defaults to 0.1.
 attn_p_drop – Attention dropout probability. Defaults to 0.1.
 resid_p_drop – Residual layer fully connected network dropout probability. Defaults to 0.1.
 clf_p_drop – Classifier dropout probability. Defaults to 0.1.
 l2_reg – L2 regularization coefficient. Defaults to 0.01.
 vector_l2 – Whether to apply weight decay regularization to vectors (biases, normalization etc..). Defaults to False.
 optimizer – Optimizer to use, current options include AdamW or AdamaxW.
 b1 – Adam b1 parameter. Defaults to 0.9.
 b2 – Adam b2 parameter. Defaults to 0.999.
 epsilon – Adam epsilon parameter: Defaults to 1e8.
 lr_schedule – Learning rate schedule – see finetune/optimizers.py for more options.
 lr – Learning rate. Defaults to 6.25e5.
 lr_warmup – Learning rate warmup (percentage of all batches to warmup for). Defaults to 0.002.
 max_grad_norm – Clip gradients larger than this norm. Defaults to 1.0.
 accum_steps – Number of updates to accumulate before applying. This is used to simulate a higher batch size.
 lm_loss_coef – Language modeling loss coefficient – a value between 0.0  1.0 that indicates how to trade off between language modeling loss and target model loss. Usually not beneficial to turn on unless dataset size exceeds a few thousand examples. Defaults to 0.0.
 summarize_grads – Include gradient summary information in tensorboard. Defaults to False.
 val_size – Validation set size if int. Validation set size as percentage of all training data if float. Validation will not be run by default if n_examples < 50. If n_examples > 50, defaults to max(5, min(100, 0.05 * n_examples))
 val_interval – Evaluate on validation set after val_interval batches. Defaults to 4 * val_size / batch_size to ensure that too much time is not spent on validation.
 lm_temp – Language model temperature – a value of 0.0 corresponds to greedy maximum likelihood predictions while a value of 1.0 corresponds to random predictions. Defaults to 0.2.
 seq_num_heads – Number of attention heads of final attention layer. Defaults to 16.
 subtoken_predictions – Return predictions at subtoken granularity or token granularity? Defaults to False.
 multi_label_sequences – Use a multilabeling approach to sequence labeling to allow overlapping labels.
 multi_label_threshold – Threshold of sigmoid unit in multi label classifier. Can be increased or lowered to trade off precision / recall. Defaults to 0.5.
 autosave_path – Save current best model (as measured by validation loss) to this location. Defaults to None.
 tensorboard_folder – Directory for tensorboard logs. Tensorboard logs will not be written unless tensorboard_folder is explicitly provided. Defaults to None.
 log_device_placement – Log which device each operation is placed on for debugging purposes. Defaults to False.
 allow_soft_placement – Allow tf to allocate an operation to a different device if a device is unavailable. Defaults to True.
 save_adam_vars – Save adam parameters when calling model.save(). Defaults to True.
 num_layers_trained – How many layers to finetune. Specifying a value less than 12 will train layers starting from model output. Defaults to 12.
 train_embeddings – Should embedding layer be finetuned? Defaults to True.
 class_weights – One of ‘log’, ‘linear’, or ‘sqrt’. Autoscales gradient updates based on class frequency. Can also be a dictionary that maps from true class name to loss coefficient. Defaults to None.
 oversample – Should rare classes be oversampled? Defaults to False.
 params_device – Which device should gradient updates be aggregated on? If you are using a single GPU and have more than 4Gb of GPU memory you should set this to GPU PCI number (0, 1, 2, etc.). Defaults to “cpu”.
 eval_acc – if True, calculates accuracy and writes it to the tensorboard summary files for valudation runs.
 save_dtype – specifies what precision to save model weights with. Defaults to np.float32.
 regression_loss – the loss to use for regression models. One of L1 or L2, defaults to L2.
 prefit_init – if True, fit target model weigths before finetuning the entire model. Defaults to False.
 debugging_logs – if True, output tensorflow logs and turn off TQDM logging. Defaults to False.
 val_set – Where it is neccessary to use an explicit validation set, provide it here as a tuple (text, labels)
 per_process_gpu_memory_fraction – fraction of the overall amount of memory that each visible GPU should be allocated, defaults to 1.0.
 adapter_size – width of adapter module from ‘Parameter Efficient Transfer Learning’ paper, if defined. defaults to ‘None’.