Analysis API
The Analysis module provides tools for token counting, format comparison, and reporting.
Analysis module for token counting and format comparison.
- class toonverter.analysis.FormatComparator(model='gpt-4')[source]
Bases:
objectCompare token usage across multiple formats.
- Parameters:
model (str)
- __init__(model='gpt-4')[source]
Initialize comparator.
- Parameters:
model (
str) – Model name for token counting- Return type:
None
- compare_formats(data, formats, encode_options=None)[source]
Compare token usage across formats.
- Parameters:
- Return type:
- Returns:
ComparisonReport with analysis for each format
- Raises:
FormatNotSupportedError – If a format is not supported
- class toonverter.analysis.ReportFormatter[source]
Bases:
objectFormat analysis reports for display.
- static format_analysis(analysis)[source]
Format single token analysis.
- Parameters:
analysis (
TokenAnalysis) – TokenAnalysis to format- Return type:
- Returns:
Formatted report string
- static format_comparison(report, detailed=False)[source]
Format comparison report.
- Parameters:
report (
ComparisonReport) – ComparisonReport to formatdetailed (
bool) – Include detailed analysis for each format
- Return type:
- Returns:
Formatted report string
- static format_json(report)[source]
Format comparison report as JSON-serializable dict.
- Parameters:
report (
ComparisonReport) – ComparisonReport to format- Return type:
- Returns:
Dictionary representation
- class toonverter.analysis.TiktokenCounter(model='gpt-4')[source]
Bases:
TokenCounterToken counter implementation using tiktoken library.
Supports various OpenAI models and provides accurate token counts.
- Parameters:
model (str)
-
MODEL_ENCODINGS:
ClassVar[dict[str,str]] = {'claude-2': 'cl100k_base', 'claude-3': 'cl100k_base', 'gpt-3.5-turbo': 'cl100k_base', 'gpt-4': 'cl100k_base', 'gpt-4-turbo': 'cl100k_base', 'text-davinci-002': 'p50k_base', 'text-davinci-003': 'p50k_base'}
- __init__(model='gpt-4')[source]
Initialize token counter.
- Parameters:
model (
str) – Model name or encoding name- Return type:
None
- analyze(text, format_name)[source]
Analyze token usage for text.
- Parameters:
- Return type:
- Returns:
TokenAnalysis with statistics
- Raises:
TokenCountError – If analysis fails
- count_tokens(text)[source]
Count tokens in text.
- Parameters:
text (
str) – Text to analyze- Return type:
- Returns:
Number of tokens
- Raises:
TokenCountError – If counting fails
- toonverter.analysis.analyze_text(text, format_name, model='gpt-4')[source]
Convenience function to analyze text.
- Parameters:
- Return type:
- Returns:
TokenAnalysis with statistics
Examples
>>> analysis = analyze_text('{"name": "Alice"}', "json") >>> print(analysis.token_count) 7
- toonverter.analysis.compare(data, formats, model='gpt-4', encode_options=None)[source]
Convenience function to compare formats.
- Parameters:
- Return type:
- Returns:
ComparisonReport with comparison results
Examples
>>> data = {"name": "Alice", "age": 30} >>> report = compare(data, ["json", "yaml", "toon"]) >>> print(f"Best format: {report.best_format}") Best format: toon
- toonverter.analysis.count_tokens(text, model='gpt-4')[source]
Convenience function to count tokens.
- Parameters:
- Return type:
- Returns:
Number of tokens
Examples
>>> count_tokens("Hello, world!") 4
- toonverter.analysis.format_report(report, format='text', detailed=False)[source]
Format comparison report.
- Parameters:
report (
ComparisonReport) – ComparisonReport to formatformat (
str) – Output format (‘text’ or ‘json’)detailed (
bool) – Include detailed analysis
- Return type:
- Returns:
Formatted report string