Skip to content

Token Usage Tracking

structx provides detailed token usage tracking across all steps of the extraction process, helping you monitor costs and optimize your queries.

Basic Usage

from structx import Extractor

# Initialize extractor
extractor = Extractor.from_litellm(
    model="gpt-4o-mini",
    api_key="your-api-key"
)

# Extract structured data
result = extractor.extract(
    data="incident_report.txt",
    query="extract incident details"
)

# Access token usage information
usage = result.get_token_usage()
if usage:
    print(f"Total tokens used: {usage.total_tokens}")
    print(f"Prompt tokens: {usage.prompt_tokens}")
    print(f"Completion tokens: {usage.completion_tokens}")

    # Print usage by step
    for step in usage.steps:
        print(f"{step.name}: {step.tokens} tokens")

Detailed Token Information

For more detailed information about extraction steps, use the detailed parameter:

# Get detailed token usage with extraction breakdowns
detailed_usage = result.get_token_usage(detailed=True)

# Access extraction details
extraction = next((s for s in detailed_usage.steps if s.name == "extraction"), None)
if extraction and hasattr(extraction, "steps"):
    print(f"Number of extraction steps: {len(extraction.steps)}")
    for i, step in enumerate(extraction.steps):
        print(f"  Extraction {i+1}: {step.tokens} tokens")

Understanding the Steps

Token usage is tracked across four main steps:

  1. Analysis: Analyzing your query to determine what to extract
  2. Refinement: Refining and expanding the query for better extraction
  3. Schema Generation: Generating the data model for extraction
  4. Extraction: Performing the actual data extraction (potentially multiple calls)

Token Usage with Multiple Queries

When using multiple queries, token usage is tracked for each query independently:

queries = ["extract dates", "extract names", "extract organizations"]
results = extractor.extract_queries(data="document.txt", queries=queries)

for query, result in results.items():
    usage = result.get_token_usage()
    if usage:
        print(f"Query: {query}")
        print(f"Total tokens: {usage.total_tokens}")

Advanced Metrics

Some LLM providers offer additional metrics like thinking tokens or cached tokens. These metrics are included when available:

if usage.thinking_tokens:
    print(f"Thinking tokens: {usage.thinking_tokens}")

if usage.cached_tokens:
    print(f"Cached tokens: {usage.cached_tokens}")

Usage with Model Refinement

Token usage is also tracked when refining data models

from pydantic import BaseModel

class User(BaseModel):
    name: str
    age: int

enhanced_user = extractor.refine_data_model(
    model=User,
    instructions="Add email and address fields, with validation for email format"
)

# Access token usage information
usage = enhanced_user.usage.get_usage_summary()
print(f"Token usage for model refinement: {usage.total_tokens}")

Understanding Token Costs

Different LLM providers charge differently for tokens:

  • Prompt tokens: Text sent to the model (typically less expensive)
  • Completion tokens: Text generated by the model (typically more expensive)

By tracking both prompt and completion tokens separately, structx helps you understand your costs more precisely.

Next Steps