Model Configuration Options¶
Model configuration options
- batch_size – Number of examples per batch, defaults to 2.
- visible_gpus – List of integer GPU ids to spread out computation across, defaults to all available GPUs.
- n_epochs – Number of iterations through training data, defaults to 3.
- random_seed – Random seed to use for repeatability purposes, defaults to 42.
- max_length – Maximum number of subtokens per sequence. Examples longer than this number will be truncated (unless chunk_long_sequences=True for SequenceLabeler models). Defaults to 512.
- weight_stddev – Standard deviation of initial weights. Defaults to 0.02.
- chunk_long_sequences – When True, use a sliding window approach to predict on examples that are longer than max length. Defaults to False.
- low_memory_mode – When True, only store partial gradients on forward pass and recompute remaining gradients incrementally in order to save memory. Defaults to False.
- interpolate_pos_embed – Interpolate positional embeddings when max_length differs from it’s original value of 512. Defaults to False.
- embed_p_drop – Embedding dropout probability. Defaults to 0.1.
- attn_p_drop – Attention dropout probability. Defaults to 0.1.
- resid_p_drop – Residual layer fully connected network dropout probability. Defaults to 0.1.
- clf_p_drop – Classifier dropout probability. Defaults to 0.1.
- l2_reg – L2 regularization coefficient. Defaults to 0.01.
- vector_l2 – Whether to apply weight decay regularization to vectors (biases, normalization etc..). Defaults to False.
- optimizer – Optimizer to use, current options include AdamW or AdamaxW.
- b1 – Adam b1 parameter. Defaults to 0.9.
- b2 – Adam b2 parameter. Defaults to 0.999.
- epsilon – Adam epsilon parameter: Defaults to 1e-8.
- lr_schedule – Learning rate schedule – see finetune/optimizers.py for more options.
- lr – Learning rate. Defaults to 6.25e-5.
- lr_warmup – Learning rate warmup (percentage of all batches to warmup for). Defaults to 0.002.
- max_grad_norm – Clip gradients larger than this norm. Defaults to 1.0.
- accum_steps – Number of updates to accumulate before applying. This is used to simulate a higher batch size.
- lm_loss_coef – Language modeling loss coefficient – a value between 0.0 - 1.0 that indicates how to trade off between language modeling loss and target model loss. Usually not beneficial to turn on unless dataset size exceeds a few thousand examples. Defaults to 0.0.
- summarize_grads – Include gradient summary information in tensorboard. Defaults to False.
- val_size – Validation set size if int. Validation set size as percentage of all training data if float. Validation will not be run by default if n_examples < 50. If n_examples > 50, defaults to max(5, min(100, 0.05 * n_examples))
- val_interval – Evaluate on validation set after val_interval batches. Defaults to 4 * val_size / batch_size to ensure that too much time is not spent on validation.
- lm_temp – Language model temperature – a value of 0.0 corresponds to greedy maximum likelihood predictions while a value of 1.0 corresponds to random predictions. Defaults to 0.2.
- seq_num_heads – Number of attention heads of final attention layer. Defaults to 16.
- subtoken_predictions – Return predictions at subtoken granularity or token granularity? Defaults to False.
- multi_label_sequences – Use a multi-labeling approach to sequence labeling to allow overlapping labels.
- multi_label_threshold – Threshold of sigmoid unit in multi label classifier. Can be increased or lowered to trade off precision / recall. Defaults to 0.5.
- autosave_path – Save current best model (as measured by validation loss) to this location. Defaults to None.
- tensorboard_folder – Directory for tensorboard logs. Tensorboard logs will not be written unless tensorboard_folder is explicitly provided. Defaults to None.
- log_device_placement – Log which device each operation is placed on for debugging purposes. Defaults to False.
- allow_soft_placement – Allow tf to allocate an operation to a different device if a device is unavailable. Defaults to True.
- save_adam_vars – Save adam parameters when calling model.save(). Defaults to True.
- num_layers_trained – How many layers to finetune. Specifying a value less than 12 will train layers starting from model output. Defaults to 12.
- train_embeddings – Should embedding layer be finetuned? Defaults to True.
- class_weights – One of ‘log’, ‘linear’, or ‘sqrt’. Auto-scales gradient updates based on class frequency. Can also be a dictionary that maps from true class name to loss coefficient. Defaults to None.
- oversample – Should rare classes be oversampled? Defaults to False.
- params_device – Which device should gradient updates be aggregated on? If you are using a single GPU and have more than 4Gb of GPU memory you should set this to GPU PCI number (0, 1, 2, etc.). Defaults to “cpu”.
- eval_acc – if True, calculates accuracy and writes it to the tensorboard summary files for valudation runs.
- save_dtype – specifies what precision to save model weights with. Defaults to np.float32.
- regression_loss – the loss to use for regression models. One of L1 or L2, defaults to L2.
- prefit_init – if True, fit target model weigths before finetuning the entire model. Defaults to False.
- debugging_logs – if True, output tensorflow logs and turn off TQDM logging. Defaults to False.
- val_set – Where it is neccessary to use an explicit validation set, provide it here as a tuple (text, labels)
- per_process_gpu_memory_fraction – fraction of the overall amount of memory that each visible GPU should be allocated, defaults to 1.0.
- adapter_size – width of adapter module from ‘Parameter Efficient Transfer Learning’ paper, if defined. defaults to ‘None’.