Using Auxiliary InfoΒΆ

Our base models can also process arbitrary auxiliary information in addition to text, such as style (bolding, italics, etc.), semantics (part-of-speech tags, sentiment tags), or other forms, as long as they describe specific spans of text.

# First we define the extra features we will be providing, through a dictionary.
# We do this by defining values and the defaults that tokens will receive if they are not explicitly labeled.
# Auxiliary info can take the form of strings, booleans, floats, or ints.
default = {'capitalized':False, 'part_of_speech':'unknown'}

# Next we create context tags in a similar format to SequenceLabeling labels, as a list of lists of dictionaries:
train_text = ['Intelligent process automation']
train_context = [[
    {'text': 'Intelligent', 'capitalized': True, 'end': 11, 'start': 0, 'part_of_speech': 'ADJ'},
    {'text': 'process automation', 'capitalized': False, 'end': 30, 'start': 12, 'part_of_speech': 'NOUN'},
]]

# Our input to the model is now a list containing the text, and the context
trainX = [train_text, train_context]

# Examples with no context must have an empty list as their context
assert len(train_text) == len(train_context)

# We indicate to the model that we are including auxiliary info by passing our default dictionary in with the default_context kwarg.
model = Classifier(default_context=default)
model.fit(trainX, trainY)