from gliner import GLiNER
# Initialize GLiNER with the base model
= GLiNER.from_pretrained("urchade/gliner_medium-v2.1")
model eval() model.
GliNER
Before transformers were around you needed things like SpaCy to do Named Entity Recognition (NER). Now you can use a transformer model like GliNER (see also the original research article). Although generic models like Lllama and GPT can extract entities, GliNER is specifically designed for NER. It’s also fast and more flexible. SpaCy remains a good choice for general NLP and there is a Gliner-Spacy wrapper giving you the best of both worlds.
Let’s take a look at how it works.
The above will, like all Huggingface things, automatically download the necessary tensors and configs. The following extracts the entities with staggering speed:
import time
= time.time()
start_time = """
text - John Field was born January 26, 1782, and died January 23, 1837. He was an Irish pianist, composer, and teacher.
- James Clerk Maxwell was born June 13, 1831, and died November 5, 1879. He was a Scottish scientist in the field of mathematical physics.
"""
= ["Person", "Date"]
labels = model.predict_entities(text, labels, threshold=0.5)
entities = time.time()
end_time = end_time - start_time
elapsed_time for entity in entities:
print(entity["text"], "=>", entity["label"])
print(f"Time taken: {elapsed_time:.2f} seconds")
John Field => Person
January 26, 1782 => Date
January 23, 1837 => Date
James Clerk Maxwell => Person
June 13, 1831 => Date
November 5, 1879 => Date
Time taken: 0.09 seconds
You can specifiy anything you like for the labels but not every word is automatically a named entity. For instance, if you tell Gliner to extract ‘Math’ or ‘Number’ this will happen:
= """
text The transcendental number e ≈ 2.71828 is Euler's number, and it is the base of the natural logarithm. It's also crucial in calculus, e.g. $e^{i\pi}=-1$.
"""
= ["Person", "Number", "Math"]
labels for entity in model.predict_entities(text, labels, threshold=0.5):
print(entity["text"], "=>", entity["label"])
e => Number
Euler => Person
natural logarithm => Math
calculus => Math
e => Number
It’s indeed correct semantically that ‘e’ is the symbol for a number but the actual value should have been extracted. It’s remarkable that ‘natural logarithm’ is correctly identified as a mathematical entity. The threshold expresses the confidence and if you lower the threshold:
for entity in model.predict_entities(text, labels, threshold=0.1):
print(entity["text"], "=>", entity["label"])
The transcendental number => Number
e => Number
2 => Number
71828 => Number
Euler => Person
natural logarithm => Math
calculus => Math
e => Number
e => Number
i => Number
pi => Number
-1 => Number
It does not see that there is a float and the dot is dientified as the end of a sentence. The threshold is a trade-off between precision and recall. The default is 0.5.
Gliner does not replace the need for sophisticated graph extraction as explained in our Graph RAG article but it can speed it up. Considering the NER extraction speed Gliner can be used to first extract the entities and handing them over to a more generic LLM to extract the relationships. This means that the graph RAG prompt is less complex and will speed up the graph extraction process. See also our Nuextract article for an alternative approach based on structured data extraction.