GliNER

GraphAI
A language model focused on NER.
Published

December 7, 2024

Before transformers were around you needed things like SpaCy to do Named Entity Recognition (NER). Now you can use a transformer model like GliNER (see also the original research article). Although generic models like Lllama and GPT can extract entities, GliNER is specifically designed for NER. It’s also fast and more flexible. SpaCy remains a good choice for general NLP and there is a Gliner-Spacy wrapper giving you the best of both worlds.

Let’s take a look at how it works.

from gliner import GLiNER

# Initialize GLiNER with the base model
model = GLiNER.from_pretrained("urchade/gliner_medium-v2.1")
model.eval()

The above will, like all Huggingface things, automatically download the necessary tensors and configs. The following extracts the entities with staggering speed:

import time
start_time = time.time()
text = """
- John Field was born January 26, 1782, and died January 23, 1837. He was an Irish pianist, composer, and teacher.
- James Clerk Maxwell was born June 13, 1831, and died November 5, 1879. He was a Scottish scientist in the field of mathematical physics.
"""
labels = ["Person", "Date"]
entities = model.predict_entities(text, labels, threshold=0.5)
end_time = time.time()
elapsed_time = end_time - start_time
for entity in entities:
    print(entity["text"], "=>", entity["label"])
print(f"Time taken: {elapsed_time:.2f} seconds")
John Field => Person
January 26, 1782 => Date
January 23, 1837 => Date
James Clerk Maxwell => Person
June 13, 1831 => Date
November 5, 1879 => Date
Time taken: 0.09 seconds

You can specifiy anything you like for the labels but not every word is automatically a named entity. For instance, if you tell Gliner to extract ‘Math’ or ‘Number’ this will happen:

text = """
The transcendental number e ≈ 2.71828 is Euler's number, and it is the base of the natural logarithm. It's also crucial in calculus, e.g. $e^{i\pi}=-1$.
"""
labels = ["Person", "Number", "Math"]
for entity in model.predict_entities(text, labels, threshold=0.5):
    print(entity["text"], "=>", entity["label"])
e => Number
Euler => Person
natural logarithm => Math
calculus => Math
e => Number

It’s indeed correct semantically that ‘e’ is the symbol for a number but the actual value should have been extracted. It’s remarkable that ‘natural logarithm’ is correctly identified as a mathematical entity. The threshold expresses the confidence and if you lower the threshold:

for entity in model.predict_entities(text, labels, threshold=0.1):
    print(entity["text"], "=>", entity["label"])
The transcendental number => Number
e => Number
2 => Number
71828 => Number
Euler => Person
natural logarithm => Math
calculus => Math
e => Number
e => Number
i => Number
pi => Number
-1 => Number

It does not see that there is a float and the dot is dientified as the end of a sentence. The threshold is a trade-off between precision and recall. The default is 0.5.

Gliner does not replace the need for sophisticated graph extraction as explained in our Graph RAG article but it can speed it up. Considering the NER extraction speed Gliner can be used to first extract the entities and handing them over to a more generic LLM to extract the relationships. This means that the graph RAG prompt is less complex and will speed up the graph extraction process. See also our Nuextract article for an alternative approach based on structured data extraction.