Why not Pydantic?

Introduction

Pydantic is one of the most popular libraries for data serialization and deserialization. However, the principles it’s built on often prevent ease of use.

In this article, we’ll explore how using adaptix instead of Pydantic can help manage common tasks more efficiently.

Note

This article is updated for pydantic==2.9.2, with code snippets run on CPython version 3.12. Some things may have changed since then, but probably not much.

The Main Thesis

Pydantic works smoothly only when you violate the Single Responsibility Principle (SRP). It wants to know about your domain layer, it wants to penetrate it. Pydantic performs best when it’s simultaneously a (de)serializer for incoming requests and a domain model. However, separating your code into layers can lead to challenges when transferring data between them.

Let’s imagine that you have three nested domain dataclasses. It works fine for a while, and then you add a new field to the deepest model and discover that Pydantic doesn’t include the UTC offset in the datetime string (or it has any other unwanted behavior).

Your possible options:

  • Completely duplicate the dataclass structure, including all nested models, into an equivalent Pydantic model with the necessary config.

  • Perform manual serialization.

  • Start using adaptix.

And what you get after switching:

  • Nothing invades your domain layer, the serialization logic lives entirely in the presentation layer.

  • Decreased code coupling.

  • You can use adaptix to convert models passing between layers.

Why can’t I just use Pydantic as the domain model? Well, let’s talk about that.

Pydantic’s pitfalls

Coupling instance creation and data parsing

Creating any model instance in Pydantic triggers data parsing. On one hand, this makes instance creation within a program significantly more resource-intensive, while on the other, it can lead to unexpected and undesirable behavior during instance creation. Let’s examine this in detail.

Instantiation penalty

Let’s take a model from the Pydantic tutorial:

from dataclasses import dataclass
from datetime import datetime

from pydantic import BaseModel, PositiveInt


class UserPydantic(BaseModel):
    id: int
    name: str = "John Doe"
    signup_ts: datetime | None
    tastes: dict[str, PositiveInt]


@dataclass(kw_only=True)
class UserDataclass:
    id: int
    name: str = "John Doe"
    signup_ts: datetime | None
    tastes: dict[str, PositiveInt]

And run a simple benchmark for creating instances:

from datetime import datetime
from timeit import timeit

from .instantiating_penalty_models import UserDataclass, UserPydantic

stmt = """
User(
    id=123,
    signup_ts=datetime(year=2019, month=6, day=1, hour=12, minute=22),
    tastes={'wine': 9, 'cheese': 7, 'cabbage': '1'},
)
"""
print("pydantic ", timeit(stmt, globals={"User": UserPydantic, "datetime": datetime}))
print("dataclass", timeit(stmt, globals={"User": UserDataclass, "datetime": datetime}))

Here are the results:

pydantic  2.3817247649421915
dataclass 0.9756000880151987

Creating a Pydantic model instance is nearly 2.4 times slower than creating a similar dataclass instance. This performance overhead is the cost you’ll pay each time you create an object in your business logic.

But Pydantic has a method, .model_construct(), for creating instances without validation! And yet, it’s even slower:

from datetime import datetime
from timeit import timeit

from .instantiating_penalty_benchmark import UserPydantic, stmt

print(
    "pydantic (model_construct)",
    timeit(stmt, globals={"User": UserPydantic.model_construct, "datetime": datetime}),
)
pydantic (model_construct) 2.8749908979516476
Some notes on the benchmarks

In fact, a significant portion of the time in the benchmark above is spent creating a datetime object. If we remove this object creation, the situation becomes even more dramatic:

from datetime import datetime
from timeit import timeit

from .instantiating_penalty_models import UserDataclass, UserPydantic

stmt = """
User(
    id=123,
    signup_ts=dt,
    tastes={'wine': 9, 'cheese': 7, 'cabbage': '1'},
)
"""
dt = datetime(year=2019, month=6, day=1, hour=12, minute=22)
print(
    "pydantic                  ",
    timeit(stmt, globals={"User": UserPydantic, "dt": dt}),
)
print(
    "pydantic (model_construct)",
    timeit(stmt, globals={"User": UserPydantic.model_construct, "dt": dt}),
)
print(
    "dataclass                 ",
    timeit(stmt, globals={"User": UserDataclass, "dt": dt}),
)
pydantic                   1.8139039139496163
pydantic (model_construct) 2.155639562988654
dataclass                  0.4947519419947639

Now Pydantic is 3.7 times slower than a standard dataclass, and 4.3 times slower if you attempt to disable validation.

Pydantic’s slowdown factor will vary depending on the complexity of the validation and the time required to create other classes.

Fused validation

Validating invariants within the model is reasonable, but validation should be separated into business logic and representation layers.

For example, type checking prevents most type-related errors, and having basic tests eliminates them altogether. Do you really need type checks each time you create a model instance? What if the model includes large lists?

Let’s look at how attrs approaches this issue. Models of attrs can’t transform themselves into JSON or load themselves from JSON. External tools (such as adaptix or cattrs) handle this functionality.

Within the model, you can declare validators to enforce business invariants, while adaptix can perform additional checks when loading data from an untrusted source.

You can also use the __post_init__ method in dataclasses for necessary validation.

As a result, with Pydantic you can either constantly run checks that you don’t need at all, or skip any validation at all using .model_construct() (which will most likely be even slower, as shown above).

Implicit conversions

The next issue lies in the fact that implicit type conversion logic, suitable for parsing, is often inappropriate for creating an object via a constructor.

For a parser, it’s entirely reasonable to perform implicit conversions (such as TolerantReader). However, this behavior can lead to errors when applied within a program.

For example, if you pass a float value to a field with the Decimal type, Pydantic will implicitly convert it instead of raising an error. This leads to the possibility that the error of using floats for monetary calculations can be hidden, potentially causing inaccuracies.

Possible loss of accuracy
from decimal import Decimal

from pydantic import BaseModel


class Product(BaseModel):
    id: int
    amount: Decimal


assert Product(id=1, amount=14.6) == Product(id=1, amount=Decimal("14.6"))

There is a way to work around this issue. To do so, you must enable strict mode and disable it each time model parsing occurs.

Necessary workaround to avoid loss of accuracy
from decimal import Decimal

from pydantic import BaseModel, ConfigDict, ValidationError


class Product(BaseModel):
    id: int
    amount: Decimal

    model_config = ConfigDict(strict=True)


try:
    Product(id=1, amount=14.6)
except ValidationError:
    pass

assert (
    Product.model_validate({"id": 1, "amount": 14.6}, strict=False)
    ==
    Product(id=1, amount=Decimal("14.6"))
)

Aliasing mess

The essence of aliases is that you have an external and an internal field name, where the external name is unsuitable for use within the program. However, the Pydantic combines different representations into a ball of mud.

By default, the constructor only accepts fields by their aliases (i.e., using the external names). You can change this with the populate_by_name configuration option. This option allows you to use the internal field names in the constructor while still accepting the external representation. Additionally, this option affects JSON parsing, enabling it to use field names alongside aliases.

Extra field is parsed as usual field
from pydantic import BaseModel, ConfigDict, Field


class User(BaseModel):
    model_config = ConfigDict(populate_by_name=True, extra="allow")

    name: str = Field(alias="full_name")
    age: int


data = {"name": "name_value", "age": 20}
assert User.model_validate(data).model_extra == {}

Mistakes silencing

One of the biggest issues with Pydantic’s approach is that extra fields passed into the constructor are ignored. As a result, such typos do not show up immediately, but remain in the program until discovered by tests or users.

Static analyzers can reduce the number of such errors, but this does not always work due to the dynamic nature of Python.

You can forbid additional fields by setting extra='forbid', though this will also affect the parser.

Extra field is ignored
from pydantic import BaseModel


class SomeModel(BaseModel):
    a: int
    b: int


SomeModel(
    a=1,
    b=2,
    c=3,  # unknown field!
)

Locking ecosystem

Pydantic’s primary purpose is data serialization and deserialization. Instead of using standard library models (@dataclass, NamedTuple, TypedDict), Pydantic introduces a new model type, even though these tasks don’t necessitate a new model type.

Pydantic models come with unique semantics, requiring special support from development tools like type checkers and IDEs. Most importantly, external libraries that don’t care about the serialization method still must add support for Pydantic models, creating dependencies on these integrations.

Pydantic does support standard library models, but this support is very limited. For example, you can’t alter parsing or serialization logic in an existing class.

You can avoid these issues by restricting Pydantic to the layer responsible for communication with the outer world. However, this requires duplicating classes and manually writing converters. Pydantic offers a from_attributes=True mode, which allows you to create model instances from other objects, though it has significant limitations.

Underdone class mapping

Pydantic offers limited support for transforming one model into another. It behaves like a regular validation mode for an unknown source, except instead of referencing dictionary keys, it accesses object attributes.

Model mapping in Pydantic
from dataclasses import dataclass

from pydantic import BaseModel


@dataclass
class Person:
    name: str
    age: float


class PersonDTO(BaseModel):
    name: str
    age: float


person = Person(name="Anna", age=20)
person_dto = PersonDTO.model_validate(person, from_attributes=True)
assert person_dto == PersonDTO(name="Anna", age=20)

This results in several issues:

First, the from_attributes=True mode uses the same aliases as parsing. You cannot configure transformations without affecting the logic for external interactions (like JSON parsing).

Second, mapping does not account for type hints from the source class, leading to unnecessary type checks. For example, if both classes contain fields with values of type list[str] with hundreds of elements, Pydantic will check the type of each value.

Third, you can’t customize class mapping so that the conversion logic differs from parsing from an unknown source. You are forced to either find workarounds or change interactions with the outside world.

Fourth, there are no checks to ensure the mapping between the target class and the source is correctly defined. Many such errors are caught in tests when the code fails with an error, but some are only noticeable upon careful result comparison, such as if a field in the target model has a default value.

Skipped error
from dataclasses import dataclass

from pydantic import BaseModel


@dataclass
class Book:
    title: str
    author: str


class BookDTO(BaseModel):
    title: str
    writer: str | None = None  # alias is forgotten!


book = Book(title="Fahrenheit 451", author="Ray Bradbury")
book_dto = BookDTO.model_validate(book, from_attributes=True)
assert book_dto == BookDTO(title="Fahrenheit 451", author=None)

Hint

You can use adaptix’s class conversion with Pydantic models, eliminating all the problems listed above (except for the second point). See conversion tutorial and Supported model kinds for details.

One presentation ought to be enough for anybody

Pydantic tightly binds parsing rules to the model itself. This creates major issues when loading or exporting the model differently based on use cases.

For example, you might load a config from various formats. While the structure of the config is generally similar, it may differ in how certain types are loaded and in field naming conventions.

Or consider having a common user model, but needing to return a different field set for different clients.

The only way to get around this problem is to try to use the context parameter and write dispatch logic inside the validators.

Pydantic is written in Rust, so Pydantic is fast?

As benchmarks show, this is far from true.

To be cautious, Pydantic’s speed is approximately equal to libraries written in Python and using code generation.

Speaking more boldly, in some cases, adaptix outperforms Pydantic by a factor of two without losing in any benchmark, and PyPy usage can significantly speed up adaptix.

For more detail, see Benchmarks.

About Adaptix

The Philosophy

adaptix does not offer a new special model that requires IDE and type checker support. It works with any model you like (@dataclass, attrs, and even Pydantic, see full list at Supported model kinds).

adaptix does not affect the model definition. You create a special object that implements the loading and dumping of models. This object is called a Retort (the name of a chemical device used to distill substances).

For each presentation format, you create a new retort instance. You can extend and combine instances to eliminate code duplication.

So, you have:
  • Models defined inside your business logic layer (these classes know nothing about serialization mechanism),

  • Retorts know how to transform these classes into outer formats.

Such separation allows you to keep your code clean and simple. Also, you can create one retort instance to handle dozens of classes following similar principles of outer representation.

See loading and dumping tutorial for details.

But that’s not all. adaptix can generate object mappers. Such converters are vital for layered applications but they are very boring to write and error-prone to maintain. For the same or similar models, you can produce a converter using one line of code. This converter knows about two models but does not affect them.

See conversion tutorial for details.

Unmentioned advantages

All the issues mentioned above highlight problems that don’t arise when using adaptix. However, there are aspects that cannot be counted as issues with Pydantic, but they could highlight adaptix in comparison.

Firstly, adaptix has a predicate system that allows granular customization of behavior. You can adjust behavior for groups of classes or for a type only if it is within a specific class. You can also configure logic separately for dictionary keys and values, even if they share the same type. See Predicate system for details.

Secondly, adaptix is designed to provide the maximum number of opportunities to follow the DRY (Don’t Repeat Yourself) principle.

  • You can override behavior for entire groups of fields and types using the predicate system mentioned earlier.

  • You can inherit rule groups, reducing code duplication.

  • You can separate rules into several isolated layers, simplifying complex transformation cascades.

For more information on these capabilities, see Retort extension and combination.

Migrating from Pydantic

adaptix provides several tools for a gradual migration from Pydantic.

First, adaptix supports Pydantic models. You can load and dump Pydantic models just as you would with @dataclass, NamedTuple, TypedDict, and others. This method ignores alias settings within the model itself, with all transformation logic defined in the retort. adaptix parses the input data and passes it to the model’s constructor. See Supported model kinds for details.

Loading and dumping Pydantic model
from adaptix import Retort
from pydantic import BaseModel


class Book(BaseModel):
    title: str
    price: int


data = {
    "title": "Fahrenheit 451",
    "price": 100,
}

retort = Retort()
book = retort.load(data, Book)
assert book == Book(title="Fahrenheit 451", price=100)
assert retort.dump(book) == data

Second, you can delegate handling of specific types directly to Pydantic with integrations.pydantic.native_pydantic. Using the built-in predicate system, you can control behavior more granularly than Pydantic itself allows (see Predicate system for details).

Delegating to Pydantic
from adaptix import Retort
from adaptix.integrations.pydantic import native_pydantic
from pydantic import BaseModel, Field


class Book(BaseModel):
    title: str = Field(alias="name")
    price: int


data = {
    "name": "Fahrenheit 451",
    "price": 100,
}

retort = Retort(
    recipe=[
        native_pydantic(Book, to_python={"by_alias": True}),
    ],
)

book = retort.load(data, Book)
assert book == Book(name="Fahrenheit 451", price=100)
assert retort.dump(book) == data

Conclusion

While Pydantic has been a popular choice for data serialization and validation in Python, it comes with notable drawbacks that can complicate software development, particularly in layered architectures. Its tight coupling of validation, serialization, and domain modeling often violates the Single Responsibility Principle, leading to issues such as unnecessary complexity, inefficiency, and loss of flexibility.

adaptix, by contrast, offers a more modular and developer-friendly approach. By decoupling serialization logic from domain models, it allows for cleaner code, easier maintenance, and more efficient operations. Whether it’s class mapping, custom validation, or handling diverse data formats, adaptix delivers robust solutions that avoid the pitfalls commonly encountered with Pydantic.