Core API

The Core module contains fundamental interfaces, types, and registry.

Core module for TOON Converter.

This module contains the fundamental interfaces, types, and registry following SOLID principles and clean architecture.

class toonverter.core.ComparisonReport(analyses, best_format, worst_format, recommendations=<factory>)[source]

Bases: object

Comparative analysis of multiple formats.

Parameters:
analyses

Token analyses for each format

best_format

Format with lowest token count

worst_format

Format with highest token count

recommendations

Optimization recommendations

__init__(analyses, best_format, worst_format, recommendations=<factory>)
Parameters:
Return type:

None

property max_savings_percentage: float

Calculate maximum possible token savings.

Returns:

Percentage savings from worst to best format

analyses: list[TokenAnalysis]
best_format: str
worst_format: str
recommendations: list[str]
exception toonverter.core.ConversionError[source]

Bases: ToonConverterError

Raised when data conversion fails.

class toonverter.core.ConversionResult(success, source_format, target_format, source_tokens=None, target_tokens=None, savings_percentage=None, data=None, error=None, metadata=<factory>)[source]

Bases: object

Result of a format conversion operation.

Parameters:
  • success (bool)

  • source_format (str)

  • target_format (str)

  • source_tokens (int | None)

  • target_tokens (int | None)

  • savings_percentage (float | None)

  • data (Any)

  • error (str | None)

  • metadata (dict[str, Any])

success

Whether conversion succeeded

source_format

Original format

target_format

Target format

source_tokens

Token count of source data

target_tokens

Token count of target data

savings_percentage

Percentage of tokens saved

data

Converted data (if successful)

error

Error message (if failed)

metadata

Additional conversion metadata

__init__(success, source_format, target_format, source_tokens=None, target_tokens=None, savings_percentage=None, data=None, error=None, metadata=<factory>)
Parameters:
  • success (bool)

  • source_format (str)

  • target_format (str)

  • source_tokens (int | None)

  • target_tokens (int | None)

  • savings_percentage (float | None)

  • data (Any)

  • error (str | None)

  • metadata (dict[str, Any])

Return type:

None

__post_init__()[source]

Calculate savings percentage if token counts are available.

Return type:

None

data: Any = None
error: str | None = None
savings_percentage: float | None = None
source_tokens: int | None = None
target_tokens: int | None = None
success: bool
source_format: str
target_format: str
metadata: dict[str, Any]
class toonverter.core.DecodeOptions(strict=True, type_inference=True, delimiter=',')[source]

Bases: object

Configuration options for decoding TOON format.

Parameters:
  • strict (bool)

  • type_inference (bool)

  • delimiter (Literal[',', '\t', '|', ';'])

strict

Raise errors on malformed input

type_inference

Automatically infer data types

delimiter

Expected field delimiter

__init__(strict=True, type_inference=True, delimiter=',')
Parameters:
  • strict (bool)

  • type_inference (bool)

  • delimiter (Literal[',', '\t', '|', ';'])

Return type:

None

delimiter: Literal[',', '\t', '|', ';'] = ','
strict: bool = True
type_inference: bool = True
exception toonverter.core.DecodingError[source]

Bases: ToonConverterError

Raised when decoding from TOON format fails.

class toonverter.core.DefaultFormatRegistry[source]

Bases: FormatRegistry

Default implementation of format adapter registry.

This class implements the Singleton pattern to ensure a single global registry instance. Thread-safe for concurrent access.

Return type:

DefaultFormatRegistry

__init__()[source]

Initialize instance attributes (idempotent for singleton).

Return type:

None

static __new__(cls)[source]

Create or return the singleton instance.

Return type:

DefaultFormatRegistry

Returns:

Singleton registry instance

clear()[source]

Clear all registered adapters.

Warning: This is primarily for testing. Use with caution.

Return type:

None

get(format_name)[source]

Retrieve format adapter by name.

Parameters:

format_name (str) – Format identifier (case-insensitive)

Return type:

FormatAdapter

Returns:

FormatAdapter instance

Raises:

FormatNotSupportedError – If format not found

is_supported(format_name)[source]

Check if format is supported.

Parameters:

format_name (str) – Format identifier (case-insensitive)

Return type:

bool

Returns:

True if format is registered

list_formats()[source]

List all registered format names.

Return type:

list[str]

Returns:

Sorted list of format identifiers

register(format_name, adapter)[source]

Register a format adapter.

Parameters:
  • format_name (str) – Format identifier (lowercase)

  • adapter (FormatAdapter) – FormatAdapter instance

Raises:

ValueError – If format already registered or invalid

Return type:

None

unregister(format_name)[source]

Unregister a format adapter.

Parameters:

format_name (str) – Format identifier (case-insensitive)

Raises:

FormatNotSupportedError – If format not found

Return type:

None

class toonverter.core.EncodeOptions(indent=2, delimiter=',', length_marker=None, compact=False, sort_keys=False, ensure_ascii=False, max_line_length=None, token_budget=None, optimization_policy=None)[source]

Bases: object

Configuration options for encoding data to TOON format.

This class uses the Builder pattern to provide preset configurations and flexible customization.

Parameters:
  • indent (int)

  • delimiter (Literal[',', '\t', '|', ';'])

  • length_marker (str | None)

  • compact (bool)

  • sort_keys (bool)

  • ensure_ascii (bool)

  • max_line_length (int | None)

  • token_budget (int | None)

  • optimization_policy (Any | None)

indent

Number of spaces for indentation (default: 2)

delimiter

Field delimiter for tabular data

length_marker

Optional length prefix for strings

compact

Use compact representation without whitespace

sort_keys

Sort dictionary keys alphabetically

ensure_ascii

Escape non-ASCII characters

max_line_length

Maximum line length before wrapping

token_budget

Maximum token count for output (active optimization)

optimization_policy

Rules for intelligent degradation

__init__(indent=2, delimiter=',', length_marker=None, compact=False, sort_keys=False, ensure_ascii=False, max_line_length=None, token_budget=None, optimization_policy=None)
Parameters:
  • indent (int)

  • delimiter (Literal[',', '\t', '|', ';'])

  • length_marker (str | None)

  • compact (bool)

  • sort_keys (bool)

  • ensure_ascii (bool)

  • max_line_length (int | None)

  • token_budget (int | None)

  • optimization_policy (Any | None)

Return type:

None

compact: bool = False
classmethod create_compact()[source]

Create preset for compact encoding.

Return type:

EncodeOptions

Returns:

EncodeOptions configured for minimal token usage

delimiter: Literal[',', '\t', '|', ';'] = ','
ensure_ascii: bool = False
indent: int = 2
length_marker: str | None = None
max_line_length: int | None = None
optimization_policy: Any | None = None
classmethod readable()[source]

Create preset for human-readable encoding.

Return type:

EncodeOptions

Returns:

EncodeOptions configured for readability

sort_keys: bool = False
classmethod tabular()[source]

Create preset for tabular data encoding.

Return type:

EncodeOptions

Returns:

EncodeOptions optimized for DataFrame-like structures

token_budget: int | None = None
exception toonverter.core.EncodingError[source]

Bases: ToonConverterError

Raised when encoding to TOON format fails.

exception toonverter.core.FileOperationError[source]

Bases: ToonConverterError

Raised when file read/write operations fail.

class toonverter.core.FormatAdapter[source]

Bases: ABC

Abstract base class for format adapters.

Format adapters implement the Strategy pattern for different data format conversions.

abstractmethod decode(data_str, options=None)[source]

Decode data from this format.

Parameters:
  • data_str (str) – String data in this format

  • options (DecodeOptions | None) – Decoding configuration options

Return type:

Any

Returns:

Decoded data (dict, list, or primitive types)

Raises:

DecodingError – If decoding fails

decode_stream(stream, **kwargs)[source]

Decode data from a stream of strings.

Parameters:
  • stream (Iterator[str]) – An iterator yielding chunks of the encoded data

  • **kwargs (Any) – Additional decoding options

Returns:

An iterator yielding decoded objects

Return type:

Iterator[Any]

Raises:

NotImplementedError – If the adapter does not support streaming

abstractmethod encode(data, options=None)[source]

Encode data to this format.

Parameters:
  • data (Any) – Data to encode (dict, list, or primitive types)

  • options (EncodeOptions | None) – Encoding configuration options

Return type:

str

Returns:

String representation in this format

Raises:

EncodingError – If encoding fails

encode_stream(data, **kwargs)[source]

Encode data to the format as a stream of strings.

Parameters:
  • data (Any) – The data to encode

  • **kwargs (Any) – Additional encoding options

Returns:

An iterator yielding chunks of the encoded data

Return type:

Iterator[str]

Raises:

NotImplementedError – If the adapter does not support streaming

abstract property format_name: str

Return the format name (e.g., ‘json’, ‘yaml’, ‘toon’).

Returns:

Format identifier string

supports_streaming()[source]

Check if the adapter supports streaming operations.

Returns:

True if streaming is supported, False otherwise.

Return type:

bool

abstractmethod validate(data_str)[source]

Validate that string conforms to this format.

Parameters:

data_str (str) – String to validate

Return type:

bool

Returns:

True if valid, False otherwise

exception toonverter.core.FormatNotSupportedError[source]

Bases: ToonConverterError

Raised when a format is not supported.

class toonverter.core.FormatRegistry[source]

Bases: ABC

Abstract base class for format adapter registry.

Implements the Factory pattern for creating format adapters.

abstractmethod get(format_name)[source]

Retrieve format adapter by name.

Parameters:

format_name (str) – Format identifier

Return type:

FormatAdapter

Returns:

FormatAdapter instance

Raises:

FormatNotSupportedError – If format not found

abstractmethod is_supported(format_name)[source]

Check if format is supported.

Parameters:

format_name (str) – Format identifier

Return type:

bool

Returns:

True if format is registered

abstractmethod list_formats()[source]

List all registered format names.

Return type:

list[str]

Returns:

List of format identifiers

abstractmethod register(format_name, adapter)[source]

Register a format adapter.

Parameters:
  • format_name (str) – Format identifier

  • adapter (FormatAdapter) – FormatAdapter instance

Raises:

ValueError – If format already registered

Return type:

None

abstractmethod unregister(format_name)[source]

Unregister a format adapter.

Parameters:

format_name (str) – Format identifier

Raises:

FormatNotSupportedError – If format not found

Return type:

None

class toonverter.core.Plugin[source]

Bases: ABC

Abstract base class for plugins.

Plugins extend TOON Converter functionality without modifying the core codebase.

cleanup()[source]

Optional cleanup hook called on shutdown.

Return type:

None

initialize()[source]

Optional initialization hook called after registration.

Return type:

None

abstract property name: str

Return plugin name.

Returns:

Unique plugin identifier

abstractmethod register(registry)[source]

Register plugin components with the registry.

Parameters:

registry (FormatRegistry) – Format registry instance

Raises:

PluginError – If registration fails

Return type:

None

abstract property version: str

Return plugin version.

Returns:

Version string (e.g., ‘1.0.0’)

exception toonverter.core.PluginError[source]

Bases: ToonConverterError

Raised when plugin loading or registration fails.

class toonverter.core.TokenAnalysis(format, token_count, model='cl100k_base', encoding='utf-8', metadata=<factory>)[source]

Bases: object

Analysis of token usage for different formats.

Parameters:
format

Data format analyzed

token_count

Number of tokens

model

Tokenizer model used

encoding

Specific encoding method

metadata

Additional analysis metadata

__init__(format, token_count, model='cl100k_base', encoding='utf-8', metadata=<factory>)
Parameters:
Return type:

None

encoding: str = 'utf-8'
model: str = 'cl100k_base'
format: str
token_count: int
metadata: dict[str, Any]
exception toonverter.core.TokenCountError[source]

Bases: ToonConverterError

Raised when token counting fails.

class toonverter.core.TokenCounter[source]

Bases: ABC

Abstract base class for token counting implementations.

abstractmethod analyze(text, format_name)[source]

Analyze token usage for text in given format.

Parameters:
  • text (str) – Text to analyze

  • format_name (str) – Format of the text

Return type:

TokenAnalysis

Returns:

TokenAnalysis with detailed statistics

Raises:

TokenCountError – If analysis fails

abstractmethod count_tokens(text)[source]

Count tokens in text.

Parameters:

text (str) – Text to analyze

Return type:

int

Returns:

Number of tokens

Raises:

TokenCountError – If counting fails

abstract property model_name: str

Return the tokenizer model name.

Returns:

Model identifier (e.g., ‘cl100k_base’, ‘gpt-4’)

exception toonverter.core.ToonConverterError[source]

Bases: Exception

Base exception for all TOON Converter errors.

exception toonverter.core.ValidationError[source]

Bases: ToonConverterError

Raised when input validation fails.

toonverter.core.get_registry()[source]

Get the global format registry instance.

Return type:

FormatRegistry

Returns:

Global FormatRegistry singleton