Of course! Here's a comprehensive guide to using jsonschema in Python, covering everything from installation to advanced features.

What is JSON Schema?
JSON Schema is a powerful vocabulary that allows you to annotate and validate JSON documents. Think of it as a "blueprint" or a "contract" for your JSON data. It describes:
- Data Types: Is it a string, number, array, object, etc.?
- Structure: What keys are required? What keys are optional?
- Values: What are the allowed values for a key (e.g., via an enum)?
- Constraints: Does a string have a maximum length? Is a number within a specific range?
- Conditional Logic: If one key is present, another must also be present.
The jsonschema library in Python is the official implementation for checking if a given JSON (or Python dictionary) conforms to a given schema.
Installation
First, you need to install the library. It's available on PyPI.
pip install jsonschema
A Simple Example: The Basics
Let's start with a very basic example to understand the core workflow.

Step 1: Define your Schema A schema is itself a JSON-like Python dictionary. It uses specific keywords to define rules.
# A schema that describes a simple "product"
product_schema = {
"type": "object", # The root must be a JSON object (a Python dict)
"properties": {
"name": {
"type": "string" # The "name" key must be a string
},
"price": {
"type": "number" # The "price" key must be a number
},
"in_stock": {
"type": "boolean" # The "in_stock" key must be a boolean
}
},
"required": ["name", "price"] # The "name" and "price" keys are mandatory
}
Step 2: Create Data to Validate Now, let's create some Python dictionaries that represent our JSON data.
# A valid product
valid_product = {
"name": "Laptop",
"price": 1200.50,
"in_stock": True
}
# An invalid product (missing 'price', 'category' is not in schema)
invalid_product_1 = {
"name": "Mouse",
"category": "Electronics"
}
# Another invalid product ('price' is a string, not a number)
invalid_product_2 = {
"name": "Keyboard",
"price": "75.99"
}
Step 3: Validate the Data
Use the validate() function from the jsonschema library.
from jsonschema import validate
print("--- Validating valid_product ---")
try:
validate(instance=valid_product, schema=product_schema)
print("✅ The data is valid!")
except Exception as e:
print(f"❌ The data is invalid: {e}")
print("\n--- Validating invalid_product_1 ---")
try:
validate(instance=invalid_product_1, schema=product_schema)
print("✅ The data is valid!")
except Exception as e:
print(f"❌ The data is invalid: {e}")
print("\n--- Validating invalid_product_2 ---")
try:
validate(instance=invalid_product_2, schema=product_schema)
print("✅ The data is valid!")
except Exception as e:
print(f"❌ The data is invalid: {e}")
Output:

--- Validating valid_product ---
✅ The data is valid!
--- Validating invalid_product_1 ---
❌ The data is invalid: 'price' is a required property
--- Validating invalid_product_2 ---
❌ The data is invalid: 'price' is not of type 'number'
Common Schema Keywords
Here are the most important keywords you'll use in your schemas.
| Keyword | Description | Example |
|---|---|---|
type |
The data type. Can be "string", "number", "integer", "boolean", "object", "array", or "null". |
"type": "string" |
properties |
Defines the schema for each key in an object. | "properties": {"name": {"type": "string"}} |
required |
An array of strings listing the keys that are mandatory in an object. | "required": ["name", "id"] |
items |
Defines the schema for all items in an array. | "items": {"type": "number"} |
additionalProperties |
By default, any extra keys in an object are forbidden. Set to True to allow any, or provide a schema to allow only specific ones. |
"additionalProperties": False |
minimum / maximum |
For numbers. The minimum/maximum inclusive value. | "minimum": 0 |
exclusiveMinimum / exclusiveMaximum |
For numbers. The minimum/maximum exclusive value. | "exclusiveMaximum": 100 |
minLength / maxLength |
For strings. The minimum/maximum length. | "minLength": 5 |
pattern |
For strings. A regular expression the string must match. | "pattern": "^[A-Za-z]+$" |
enum |
The value must be exactly one of the items in the provided list. | "enum": ["admin", "user", "guest"] |
const |
The value must be exactly the provided constant. | "const": "active" |
anyOf |
The data must be valid against at least one of the provided subschemas. | "anyOf": [{"type": "string"}, {"type": "boolean"}] |
allOf |
The data must be valid against all of the provided subschemas. | "allOf": [{"type": "string"}, {"minLength": 5}] |
oneOf |
The data must be valid against exactly one of the provided subschemas. | "oneOf": [{"type": "number"}, {"type": "string"}] |
not |
The data must not be valid against the provided schema. | "not": {"type": "null"} |
Handling Validation Errors
The validate() function raises a jsonschema.exceptions.ValidationError when validation fails. It's crucial to catch this exception to handle errors gracefully.
The ValidationError object is very informative and contains details about the error.
from jsonschema import ValidationError
data_to_test = {"name": "A"} # Missing 'price', name is too short
schema = {
"type": "object",
"properties": {
"name": {"type": "string", "minLength": 5},
"price": {"type": "number"}
},
"required": ["name", "price"]
}
try:
validate(instance=data_to_test, schema=schema)
except ValidationError as e:
print(f"Validation failed: {e.message}")
print(f"Path to error: {list(e.path)}")
print(f"Invalid value: {e.instance}")
print(f"Schema rule: {e.schema}")
Output:
Validation failed: 'price' is a required property
Path to error: ['price']
Invalid value: {'name': 'A'}
Schema rule: {'type': 'number'}
You can also check the e.validator field to see which keyword caused the failure (e.g., 'required', 'type', 'minLength').
Advanced Features
a) $id and Refs ($ref)
For large schemas, it's useful to break them into smaller, reusable parts. You can do this using $id and $ref.
$id: A unique URI for the schema, allowing other schemas to reference it.$ref: A reference to another schema. The library will resolve this reference and validate against the target schema.
Let's create a schema for a user that reuses a "address" schema.
# Define a reusable schema for an address
address_schema = {
"$id": "https://example.com/schemas/address.json",
"type": "object",
"properties": {
"street_address": {"type": "string"},
"city": {"type": "string"},
"state": {"type": "string"}
},
"required": ["street_address", "city", "state"]
}
# Define the main user schema, which references the address schema
user_schema = {
"$id": "https://example.com/schemas/user.json",
"type": "object",
"properties": {
"username": {"type": "string"},
"email": {"type": "string", "format": "email"},
"address": {"$ref": "https://example.com/schemas/address.json"} # Reference the address schema
},
"required": ["username", "email", "address"]
}
# A valid user object
valid_user = {
"username": "jane_doe",
"email": "jane@example.com",
"address": {
"street_address": "123 Python Lane",
"city": "Codeville",
"state": "CA"
}
}
# An invalid user object (invalid address)
invalid_user = {
"username": "john_doe",
"email": "john@example.com",
"address": {
"street_address": "456 Java Ave" # Missing 'city' and 'state'
}
}
from jsonschema import validate
print("--- Validating valid_user ---")
try:
validate(instance=valid_user, schema=user_schema)
print("✅ The data is valid!")
except ValidationError as e:
print(f"❌ The data is invalid: {e.message}")
print("\n--- Validating invalid_user ---")
try:
validate(instance=invalid_user, schema=user_schema)
print("✅ The data is valid!")
except ValidationError as e:
print(f"❌ The data is invalid: {e.message}")
b) format keyword
The format keyword is for semantic validation, not just structural validation. It checks if a string conforms to a common format.
"email": Checks for a basic email pattern."uri": Checks for a URI/URL."date-time": Checks for an ISO 8601 date-time string."ipv4","ipv6": Checks for IP addresses.
Note: format validation is often less strict than you might expect. For example, "email" uses a simple regex and won't catch all invalid emails. For critical applications, you might need a dedicated library for deep validation.
user_schema_with_format = {
"type": "object",
"properties": {
"email": {"type": "string", "format": "email"}
},
"required": ["email"]
}
valid_email_data = {"email": "test@example.com"}
invalid_email_data = {"email": "not-an-email"}
validate(instance=valid_email_data, schema=user_schema_with_format) # Passes
validate(instance=invalid_email_data, schema=user_schema_with_format) # Raises ValidationError
Best Practices
- Use Specific Types: Prefer
"integer"over"number"if you only want whole numbers. - Be Explicit with
additionalProperties: Unless you want to allow any extra keys, it's best practice to set"additionalProperties": Falseto prevent typos in keys from going unnoticed. - Provide Clear Error Messages: When a validation fails, the
ValidationErroris detailed. Use this information to give clear feedback to the user of your API or application. - Reuse with
$ref: For complex applications, break your schemas into logical, reusable components using$idand$ref. This keeps your main schemas clean and maintainable. - Version Your Schemas: If your data structure might change over time, consider adding a
$schemakeyword to your schema to specify the JSON Schema version you are using (e.g.,"https://json-schema.org/draft/2025-12/schema"). This ensures future compatibility.
