Skip to content

Blacksuan19/structx

Repository files navigation

structx

Type-safe structured data extraction from text using LLMs.

Documentation PyPI GitHub Actions

structx is a powerful Python library for extracting structured data from text using Large Language Models (LLMs). It dynamically generates type-safe data models and provides consistent, structured extraction with support for complex nested data structures.

Features

  • 🔄 Dynamic model generation from natural language queries
  • 🎯 Automatic schema inference and generation
  • 📊 Support for complex nested data structures
  • 🔄 Model refinement with natural language instructions
  • 📈 Token usage tracking with detailed step-by-step metrics
  • 📄 Support for unstructured text and document processing
  • 🚀 Multi-threaded processing with async support
  • 🔌 Support for multiple LLM providers through litellm
  • 🔄 Automatic retry mechanism with exponential backoff

Installation

pip install structx-llm

For document processing support

pip install structx-llm[docs]

Quick Start

from structx import Extractor

# Initialize extractor
extractor = Extractor.from_litellm(
    model="gpt-4o-mini",
    api_key="your-api-key",
    max_retries=3,      # Automatically retry on transient errors
    min_wait=1,         # Start with 1 second wait
    max_wait=10         # Maximum 10 seconds between retries
)

# Extract structured data
result = extractor.extract(
    data="System check on 2024-01-15 detected high CPU usage (92%) on server-01.",
    query="extract incident date and details"
)

# Access results
print(f"Extracted {result.success_count} items")
print(result.data[0].model_dump_json(indent=2))

# Check token usage
usage = result.get_token_usage()
if usage:
    print(f"Total tokens: {usage.total_tokens}")
    print(f"By step: {[(s.name, s.tokens) for s in usage.steps]}")

Documentation

For comprehensive documentation, examples, and guides, visit our documentation site.

Examples

Check out our example gallery for real-world use cases,

Supported File Formats

  • Structured: CSV, Excel, JSON, Parquet, Feather
  • Unstructured: TXT, PDF, DOCX, Markdown, and more

Contributing

Contributions are welcome! Please read our Contributing Guidelines for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.