Type-safe structured data extraction from text using LLMs.
structx
is a powerful Python library for extracting structured data from text
using Large Language Models (LLMs). It dynamically generates type-safe data
models and provides consistent, structured extraction with support for complex
nested data structures.
- 🔄 Dynamic model generation from natural language queries
- 🎯 Automatic schema inference and generation
- 📊 Support for complex nested data structures
- 🔄 Model refinement with natural language instructions
- 📈 Token usage tracking with detailed step-by-step metrics
- 📄 Support for unstructured text and document processing
- 🚀 Multi-threaded processing with async support
- 🔌 Support for multiple LLM providers through litellm
- 🔄 Automatic retry mechanism with exponential backoff
pip install structx-llm
pip install structx-llm[docs]
from structx import Extractor
# Initialize extractor
extractor = Extractor.from_litellm(
model="gpt-4o-mini",
api_key="your-api-key",
max_retries=3, # Automatically retry on transient errors
min_wait=1, # Start with 1 second wait
max_wait=10 # Maximum 10 seconds between retries
)
# Extract structured data
result = extractor.extract(
data="System check on 2024-01-15 detected high CPU usage (92%) on server-01.",
query="extract incident date and details"
)
# Access results
print(f"Extracted {result.success_count} items")
print(result.data[0].model_dump_json(indent=2))
# Check token usage
usage = result.get_token_usage()
if usage:
print(f"Total tokens: {usage.total_tokens}")
print(f"By step: {[(s.name, s.tokens) for s in usage.steps]}")
For comprehensive documentation, examples, and guides, visit our documentation site.
- Getting Started
- Basic Extraction
- Unstructured Text Processing
- Async Operations
- Multiple Queries
- Custom Models
- Token Usage Tracking
- API Reference
Check out our example gallery for real-world use cases,
- Structured: CSV, Excel, JSON, Parquet, Feather
- Unstructured: TXT, PDF, DOCX, Markdown, and more
Contributions are welcome! Please read our Contributing Guidelines for details.
This project is licensed under the MIT License - see the LICENSE file for details.