Skip to content

Configuration Options

structx provides flexible configuration options for extraction.

Configuration Methods

You can configure structx in several ways:

YAML Configuration

# config.yaml
analysis:
  temperature: 0.2
  top_p: 0.1
  max_tokens: 2000

refinement:
  temperature: 0.1
  top_p: 0.05
  max_tokens: 2000

extraction:
  temperature: 0.0
  top_p: 0.1
  max_tokens: 2000
  frequency_penalty: 0.1
extractor = Extractor.from_litellm(
    model="gpt-4o-mini",
    api_key="your-api-key",
    config="config.yaml"
)

Dictionary Configuration

config = {
    "analysis": {
        "temperature": 0.2,
        "top_p": 0.1,
        "max_tokens": 2000
    },
    "refinement": {
        "temperature": 0.1,
        "top_p": 0.05,
        "max_tokens": 2000
    },
    "extraction": {
        "temperature": 0.0,
        "top_p": 0.1,
        "max_tokens": 2000,
        "frequency_penalty": 0.1
    }
}

extractor = Extractor.from_litellm(
    model="gpt-4o-mini",
    api_key="your-api-key",
    config=config
)

ExtractionConfig Object

from structx import ExtractionConfig, StepConfig

config = ExtractionConfig(
    analysis=StepConfig(
        temperature=0.2,
        top_p=0.1,
        max_tokens=2000
    ),
    refinement=StepConfig(
        temperature=0.1,
        top_p=0.05,
        max_tokens=2000
    ),
    extraction=StepConfig(
        temperature=0.0,
        top_p=0.1,
        max_tokens=2000,
        frequency_penalty=0.1
    )
)

extractor = Extractor.from_litellm(
    model="gpt-4o-mini",
    api_key="your-api-key",
    config=config
)

Configuration Parameters

Step Configuration

Each step in the extraction process can be configured separately:

  1. Analysis: Query analysis to determine what to extract
  2. Refinement: Query refinement and model generation
  3. Extraction: Actual data extraction

Common Parameters

Parameter Type Default Description
temperature float varies Sampling temperature (0.0-1.0)
top_p float varies Nucleus sampling parameter (0.0-1.0)
max_tokens int 2000 Maximum tokens in completion

Default Values

Step Temperature Top P Max Tokens
Analysis 0.2 0.1 2000
Refinement 0.1 0.05 2000
Extraction 0.0 0.1 2000

Retry Configuration

You can configure the retry behavior for extraction:

extractor = Extractor.from_litellm(
    model="gpt-4o-mini",
    api_key="your-api-key",
    max_retries=5,      # Maximum number of retry attempts
    min_wait=2,         # Minimum seconds to wait between retries
    max_wait=30         # Maximum seconds to wait between retries
)

Retry Parameters

Parameter Type Default Description
max_retries int 3 Maximum number of retry attempts
min_wait int 1 Minimum seconds to wait between retries
max_wait int 10 Maximum seconds to wait between retries

Processing Configuration

You can configure the processing behavior:

extractor = Extractor.from_litellm(
    model="gpt-4o-mini",
    api_key="your-api-key",
    max_threads=20,     # Maximum number of concurrent threads
    batch_size=50       # Size of processing batches
)

Processing Parameters

Parameter Type Default Description
max_threads int 10 Maximum number of concurrent threads
batch_size int 100 Size of processing batches

Best Practices

  1. Temperature Settings:

  2. Use lower temperatures (0.0-0.2) for consistent extraction

  3. Higher temperatures may introduce variability

  4. Token Limits:

  5. Ensure max_tokens is sufficient for your extraction needs

  6. Complex extractions may require higher limits

  7. Batch Size:

  8. Adjust based on your data size and memory constraints

  9. Smaller batches use less memory but may be slower

  10. Thread Count:

  11. Set based on your CPU capabilities
  12. Too many threads can cause resource contention