Analysis API

The Analysis module provides tools for token counting, format comparison, and reporting.

Analysis module for token counting and format comparison.

class toonverter.analysis.FormatComparator(model='gpt-4')[source]

Bases: object

Compare token usage across multiple formats.

Parameters:

model (str)

__init__(model='gpt-4')[source]

Initialize comparator.

Parameters:

model (str) – Model name for token counting

Return type:

None

compare_formats(data, formats, encode_options=None)[source]

Compare token usage across formats.

Parameters:
  • data (Any) – Data to encode and analyze

  • formats (list[str]) – List of format names to compare

  • encode_options (dict[str, EncodeOptions] | None) – Optional format-specific encoding options

Return type:

ComparisonReport

Returns:

ComparisonReport with analysis for each format

Raises:

FormatNotSupportedError – If a format is not supported

class toonverter.analysis.ReportFormatter[source]

Bases: object

Format analysis reports for display.

static format_analysis(analysis)[source]

Format single token analysis.

Parameters:

analysis (TokenAnalysis) – TokenAnalysis to format

Return type:

str

Returns:

Formatted report string

static format_comparison(report, detailed=False)[source]

Format comparison report.

Parameters:
  • report (ComparisonReport) – ComparisonReport to format

  • detailed (bool) – Include detailed analysis for each format

Return type:

str

Returns:

Formatted report string

static format_json(report)[source]

Format comparison report as JSON-serializable dict.

Parameters:

report (ComparisonReport) – ComparisonReport to format

Return type:

dict

Returns:

Dictionary representation

class toonverter.analysis.TiktokenCounter(model='gpt-4')[source]

Bases: TokenCounter

Token counter implementation using tiktoken library.

Supports various OpenAI models and provides accurate token counts.

Parameters:

model (str)

MODEL_ENCODINGS: ClassVar[dict[str, str]] = {'claude-2': 'cl100k_base', 'claude-3': 'cl100k_base', 'gpt-3.5-turbo': 'cl100k_base', 'gpt-4': 'cl100k_base', 'gpt-4-turbo': 'cl100k_base', 'text-davinci-002': 'p50k_base', 'text-davinci-003': 'p50k_base'}
__init__(model='gpt-4')[source]

Initialize token counter.

Parameters:

model (str) – Model name or encoding name

Return type:

None

analyze(text, format_name)[source]

Analyze token usage for text.

Parameters:
  • text (str) – Text to analyze

  • format_name (str) – Format of the text

Return type:

TokenAnalysis

Returns:

TokenAnalysis with statistics

Raises:

TokenCountError – If analysis fails

count_tokens(text)[source]

Count tokens in text.

Parameters:

text (str) – Text to analyze

Return type:

int

Returns:

Number of tokens

Raises:

TokenCountError – If counting fails

property model_name: str

Return the model name.

Returns:

Model identifier

toonverter.analysis.analyze_text(text, format_name, model='gpt-4')[source]

Convenience function to analyze text.

Parameters:
  • text (str) – Text to analyze

  • format_name (str) – Format of the text

  • model (str) – Model name or encoding

Return type:

TokenAnalysis

Returns:

TokenAnalysis with statistics

Examples

>>> analysis = analyze_text('{"name": "Alice"}', "json")
>>> print(analysis.token_count)
7
toonverter.analysis.compare(data, formats, model='gpt-4', encode_options=None)[source]

Convenience function to compare formats.

Parameters:
  • data (Any) – Data to analyze

  • formats (list[str]) – List of format names

  • model (str) – Model name for token counting

  • encode_options (dict[str, EncodeOptions] | None) – Format-specific encoding options

Return type:

ComparisonReport

Returns:

ComparisonReport with comparison results

Examples

>>> data = {"name": "Alice", "age": 30}
>>> report = compare(data, ["json", "yaml", "toon"])
>>> print(f"Best format: {report.best_format}")
Best format: toon
toonverter.analysis.count_tokens(text, model='gpt-4')[source]

Convenience function to count tokens.

Parameters:
  • text (str) – Text to analyze

  • model (str) – Model name or encoding

Return type:

int

Returns:

Number of tokens

Examples

>>> count_tokens("Hello, world!")
4
toonverter.analysis.format_report(report, format='text', detailed=False)[source]

Format comparison report.

Parameters:
  • report (ComparisonReport) – ComparisonReport to format

  • format (str) – Output format (‘text’ or ‘json’)

  • detailed (bool) – Include detailed analysis

Return type:

str

Returns:

Formatted report string

Deduplication