TOON Format Specification v2.0
TOON (Token-Optimized Object Notation) is designed for maximum token efficiency while maintaining readability. This guide covers the TOON v2.0 specification as implemented in toonverter.
Overview
TOON reduces token usage by 30-60% compared to JSON through:
Minimal syntax (no unnecessary braces, quotes, commas)
Tabular format for uniform arrays
Indentation-based structure
Smart quoting rules
Three Root Forms
Every TOON document has one of three root forms:
1. Object Form (Default)
Key-value pairs, one per line:
name: Alice
age: 30
city: NYC
2. Array Form
Collection with length annotation:
[3]:
- Alice
- Bob
- Charlie
3. Primitive Form
Single value (string, number, boolean, null):
Hello World
Three Array Forms
Arrays can be encoded in three different forms depending on their content:
1. Inline Array
For primitive values on a single line:
tags[3]: python,llm,optimization
Requirements: - All elements must be primitives (string, number, boolean, null) - No nested structures
2. Tabular Array
For uniform objects with primitive values:
users[3]{name,age,city}:
Alice,30,NYC
Bob,25,LA
Charlie,35,SF
Requirements: - All elements must be objects - All objects must have the same keys - All values must be primitives (no nested objects/arrays)
Benefits: - Highest compression ratio (40-60% savings) - CSV-like efficiency - Perfect for DataFrame-like data
3. List Array
For complex or mixed structures:
items[2]:
- name: Item1
price: 19.99
tags[2]: sale,new
- name: Item2
price: 29.99
nested:
key: value
Requirements:
- Used when inline or tabular forms don’t apply
- Supports nested objects and arrays
- Each item starts with - marker
Inline Objects: First field on dash line, remaining fields indented:
users[2]:
- name: Alice
age: 30
- name: Bob
age: 25
String Quoting Rules
Strings need quotes in these cases:
Empty or Whitespace-Only
empty: "" spaces: " "
Leading or Trailing Whitespace
text: " leading" text: "trailing "
Reserved Words
value: "true" # Would be parsed as boolean without quotes value: "false" value: "null"
Numeric-Looking Strings
id: "123" # Would be parsed as number without quotes code: "3.14" ref: "-42"
Special Characters
path: "test:value" # Contains colon expr: "test[0]" # Contains brackets data: "test{key}" # Contains braces item: "a,b,c" # Contains comma cmd: "a|b" # Contains pipe
Hyphen at Start
value: "-test" value: "-" value: "--option"
Contains Delimiter
The delimiter varies based on context (comma by default):
text: "a,b,c" # Comma delimiter text: "a\\tb\\tc" # Tab delimiter text: "a|b|c" # Pipe delimiter
Strings That Don’t Need Quotes
# Simple strings
name: hello
value: test
# Alphanumeric with underscores
key: user_name
field: my_var123
# Hyphens in middle (not at start)
name: test-value
key: multi-word-string
Number Canonical Form
Numbers must follow canonical form rules:
Valid Numbers
count: 42
price: 19.99
negative: -3.14
zero: 0
Normalization Rules
These are normalized (not allowed in strict mode):
# 1.0 → 1 (remove unnecessary decimal)
# 1e5 → 100000 (no exponential notation)
# -0 → 0 (normalize negative zero)
# NaN → null (special values become null)
# Infinity → null
# -Infinity → null
Delimiters
TOON supports three delimiters with different markers:
Comma (Default)
No marker needed:
a: 1
b: 2
c: 3
tags[3]: one,two,three
Tab
Marked with {TAB} at document start:
{TAB}
a: 1 b: 2 c: 3
tags[3]: one two three
Pipe
Marked with {PIPE} at document start:
{PIPE}
a: 1|b: 2|c: 3
tags[3]: one|two|three
Escape Sequences
Only 5 escape sequences are allowed in TOON:
Escape |
Meaning |
Example |
|---|---|---|
|
Backslash |
|
|
Double quote |
|
|
Newline |
|
|
Carriage return |
|
|
Tab |
|
Note: Other common escape sequences like \\u0041 are not supported in TOON.
Indentation
TOON uses indentation to represent nesting:
Rules
Use spaces only (tabs forbidden as whitespace)
Default indent: 2 spaces
Must be consistent throughout document
Each nesting level adds one indent level
Example
user:
name: Alice
address:
city: NYC
zip: "10001"
contacts[2]:
- type: email
value: alice@example.com
- type: phone
value: "555-1234"
Type Annotations
Optional type annotations using pipe syntax:
count: 100|int
price: 19.99|float
active: true|bool
updated: 2025-01-15T10:30:00|datetime
data: null|None
Note: Most types are inferred automatically, so annotations are rarely needed.
Complete Example
# User database (TOON format)
users[3]{id,name,age,city}:
1,Alice,30,NYC
2,Bob,25,LA
3,Charlie,35,SF
metadata:
created: 2025-01-15T10:30:00
version: 1.0
tags[4]: users,database,production,v1
settings:
max_users: 1000
enable_auth: true
features[3]:
- name: notifications
enabled: true
- name: analytics
enabled: false
- name: export
enabled: true
Token Savings
Real-World Examples
Simple Object:
Format |
Tokens |
Savings |
|---|---|---|
JSON |
24 |
0% |
YAML |
20 |
16% |
TOON |
16 |
33% |
Tabular Data (100 rows, 3 columns):
Format |
Tokens |
Savings |
|---|---|---|
JSON |
1200 |
0% |
YAML |
900 |
25% |
TOON |
600 |
50% |
Reference
For the complete official specification:
See Also
Quick Start - Start using TOON format
Configuration - Configure encoding options
Tabular Data - Tabular array examples