Structured Outputs with LLMs

Graphs

Structured outputs in the context of Large Language Models (LLMs) refer to the generation of data in a predefined format or schema, rather than free-form text. This can include outputs like JSON objects, tables, lists, or any other structured data format that can be easily parsed and utilized by other systems or applications.

This is a crucial element in integrating classic programming with LLMs, since the output can typically be very diverse, statistical and different for each LLM used. Rectifying this variance is crucial for downstream processing and turns the LLM output into something more akin to web-services.

Although models can output JSON this goes a step further, it complies to a predefined schema via a given set of Pydantic formats.

Some models have explicit support for structured output and even without it you can use some prompt engineering like this:


import openai

# Define the prompt with instructions for structured output
prompt = """
Extract the address from the following input and provide it in JSON format:
Input: "Send the package to 123 Main St, Springfield, IL 62701."
Output:
{
  "street": "123 Main St",
  "city": "Springfield",
  "state": "IL",
  "zip_code": "62701"
}
"""
response = openai.Completion.create(
    engine="text-davinci-003",
    prompt=prompt,
    max_tokens=100
)
print(response.choices[0].text.strip())

A generic solution comes from the instructor package which acts as an adapter for any LLM. To enable it you have to simply wrap the LLM like so:

import instructor
from pydantic import BaseModel
from openai import OpenAI
import json
import logging
logging.basicConfig(level=logging.CRITICAL)

class Address(BaseModel):
    street: str
    zip: str
    city: str
    country: str
 

llm = OpenAI( base_url="http://localhost:11434/v1")
client = instructor.from_openai(llm)

def gen(input:str):
    return client.chat.completions.create(
        model="qwen2.5:14b",
        response_model=Address,
        messages=[{"role": "user", "content": f"Extract the address from the following input: {input}"}],
    )
 
        
entity = gen("OpenAI has its headquarters at San Francisco, 3180 18th St, United States.")
if entity is not None:
     print(json.dumps(entity.dict(), indent=4))
else:
    print("Could not extract anything.")
{
    "street": "3180 18th St",
    "zip": "94110",
    "city": "San Francisco",
    "country": "United States"
}

Of course, this has an impact on performance but that’s the price to pay for integrability.

The instructor package is more than this and allows for a variety of use-cases. For example, the rephrase and respond (RaR) pattern:

from pydantic import BaseModel
import instructor
from openai import OpenAI

llm = OpenAI( base_url="http://localhost:11434/v1")
client = instructor.from_openai(llm)


class Response(BaseModel):
    rephrased_question: str
    answer: str


def rephrase_and_respond(query):
    return client.chat.completions.create(
        model="llama3.2",
        messages=[
            {
                "role": "user",
                "content": f"""{query}\nRephrase and expand the question, and respond.""",  
            }
        ],
        response_model=Response,
    )
query = "Take the last letters of the words in 'Edgar Bob' and concatinate them."
response = rephrase_and_respond(query)
print(response.rephrased_question)
print(response.answer)
What is the concatenated last letter of each word in the name Edgar Bob?
rg

The structure can themselves be valuable inside an agent logic. For example, this splits an initial question into multiple ones which get answered in parallel and finnaly consolidated into one answer (following this paper by Meta):

import instructor
from openai import AsyncOpenAI
from pydantic import BaseModel, Field
import asyncio
from typing import Optional
import nest_asyncio
nest_asyncio.apply()

client = instructor.from_openai(AsyncOpenAI())


class ReasoningAndResponse(BaseModel):
    intermediate_reasoning: str = Field(description="""Intermediate reasoning steps""")
    correct_answer: str


class MaybeResponse(BaseModel):
    result: Optional[ReasoningAndResponse]
    error: Optional[bool]
    error_message: Optional[str] = Field(
        description="""Informative explanation of why the reasoning chain was unable to generate a result"""
    )


class QueryDecomposition(BaseModel):
    queries: list[str] = Field(description="""A list of queries that need to be answered in order to derive the final answer""")


async def generate_queries(query: str):
    return await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": """You are a helpful assistant that decomposes a query into multiple sub-queries.""",
            },
            {"role": "user", "content": query},
        ],
        response_model=QueryDecomposition,
    )


async def generate_reasoning_chain(query: str) -> MaybeResponse:
    return await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": """
                Given a question and a context,
                answer the question step-by-step.

                Indicate the intermediate reasoning
                steps.
                """,
            },
            {"role": "user", "content": query},
        ],
        response_model=MaybeResponse,
    )


async def batch_reasoning_chains(
    queries: list[str],
) -> list[MaybeResponse]:
    coros = [generate_reasoning_chain(query) for query in queries]
    results = await asyncio.gather(*coros)
    return results


async def generate_response(query: str, context: list[MaybeResponse]):
    formatted_context = "\n".join(
        [
            f"""
            {item.result.intermediate_reasoning}
            {item.result.correct_answer}
            """
            for item in context
            if not item.error and item.result
        ]
    )

    return await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": """
                Given a question and a context, answer the question step-by-step.
                If you are unsure, answer Unknown.
                """,
            },
            {
                "role": "user",
                "content": f"""
                    <question>
                    {query}
                    </question>
                    <context>
                    {formatted_context}
                    </context>
                    """,
            },
        ],
        response_model=ReasoningAndResponse,
    )


query = """Would Arnold Schwarzenegger have been able to deadlift an adult Black rhinoceros at his peak strength?"""
decomposed_queries = asyncio.run(generate_queries(query))
for generated_query in decomposed_queries.queries:
    print(generated_query)
 
chains = asyncio.run(batch_reasoning_chains(
    decomposed_queries.queries))

for chain in chains:
    print(chain.model_dump_json(indent=2))  

response = asyncio.run(generate_response(query,
    chains))

print(response.model_dump_json(indent=2))
What is the maximum deadlift weight achieved by Arnold Schwarzenegger at his peak?
What is the average weight of an adult Black rhinoceros?
Can Arnold Schwarzenegger's deadlift capability surpass the average weight of an adult Black rhinoceros?
{
  "result": {
    "intermediate_reasoning": "Arnold Schwarzenegger, primarily known for his bodybuilding achievements, did not have a recorded deadlift competition weight that is well-publicized. However, it is widely reported that during his peak, he was able to deadlift approximately 700 pounds (around 317.5 kg). This figure is noted based on his overall strength and conditioning as part of his training regime while competing in bodybuilding. He was more focused on bodybuilding lifts such as squats and bench presses, but his deadlift was also quite significant.",
    "correct_answer": "Approximately 700 pounds (317.5 kg)"
  },
  "error": null,
  "error_message": null
}
{
  "result": {
    "intermediate_reasoning": "The average weight of an adult Black rhinoceros can vary depending on several factors such as their age, sex, and subspecies. Adult Black rhinos typically weigh between 800 to 1,400 pounds (363 to 635 kg). To find an average we can calculate the midpoint of this range.",
    "correct_answer": "The average weight of an adult Black rhinoceros is approximately 1,000 pounds (454 kg)."
  },
  "error": null,
  "error_message": null
}
{
  "result": {
    "intermediate_reasoning": "To determine if Arnold Schwarzenegger's deadlift capability can surpass the average weight of an adult black rhinoceros, we first need to know the figures involved. Arnold Schwarzenegger, during his peak bodybuilding years, had a deadlift maximum around 710 pounds (approximately 322 kg). On the other hand, the average adult black rhinoceros weighs between 1,800 to 2,200 pounds (approximately 816 to 998 kg). Since Schwarzenegger's deadlift maximum (710 lbs) is significantly lower than the minimum weight of an adult black rhinoceros (1,800 lbs), we conclude that his deadlift capability cannot surpass this weight.",
    "correct_answer": "No, Arnold Schwarzenegger's deadlift capability cannot surpass the average weight of an adult Black rhinoceros."
  },
  "error": null,
  "error_message": null
}
{
  "intermediate_reasoning": "Arnold Schwarzenegger's peak deadlift was around 710 pounds (approximately 322 kg). In contrast, the weight of an adult Black rhinoceros ranges from about 1,800 to 2,200 pounds (approximately 816 to 998 kg). Since Schwarzenegger's deadlift capability (710 lbs) is significantly lower than the minimum weight for an adult Black rhinoceros (1,800 lbs), he would not have been able to deadlift one.",
  "correct_answer": "No, Arnold Schwarzenegger would not have been able to deadlift an adult Black rhinoceros at his peak strength."
}