Benchmarks¶
Measure principles¶
These benchmarks aim to make a complete, fair, and reliable comparison between different libraries among different versions of Python.
If you find a mistake in benchmarking methods or you want to add another library to the comparison create a new issue.
All benchmarks are made via pyperf – an advanced library used to measure the performance of Python interpreters. It takes care of calibration, warming up, and gauging.
To handle a vast number of benchmarks variations and make pyperf API more convenient
new internal framework was created. It adds no overhead and is intended only to orchestrate pyperf runs.
All measurements exclude the time required to initialize and generate the conversion function.
Each library is tested with different options that may affect performance.
All benchmarks listed below were produced with libraries:
Benchmarks analysis¶
Important
Serializing and deserializing libraries have a lot of options that customize the conversion process. These parameters may greatly affect performance but there is no way to create benchmarks for each combination of these options. So, performance for your specific case may be different.
Simple Structures (loading)¶
This benchmark examines the loading of basic structures natively supported by all the libraries.
The library has to produce models from dict:
from dataclasses import dataclass
@dataclass
class Review:
id: int
title: str
rating: float
content: str # renamed to 'text'
@dataclass
class Book:
id: int
name: str
reviews: list[Review] # contains 100 items
Cases description
dv indicates that Converter option detailed_validation is enabled
(doc)
dp denotes that parameter debug_path of Factory is set to True
(doc)
lc signifies that lazy_compilation flag of model Config is activated
(doc)
strict means that parameter strict at model_config is turned on
(doc)
Notes about implementation:
marshmallow can not create an instance of dataclass or another model, so,
@post_loadhook was used (doc)msgspec can not be built for pypy
Simple Structures (dumping)¶
This benchmark studies the dumping of basic structures natively supported by all the libraries.
The library has to convert the model instance to dict used at loading benchmark:
from dataclasses import dataclass
@dataclass
class Review:
id: int
title: str
rating: float
content: str # renamed to 'text'
@dataclass
class Book:
id: int
name: str
reviews: list[Review] # contains 100 items
Cases description
dt_all, dt_first and dt_disable expresses that debug_trail parameter of Retort
set to DebugTrail.ALL, DebugTrail.FIRST, DebugTrail.DISABLE
(doc)
no_gc points to that models have disabled gc option
(doc)
dv indicates that Converter option detailed_validation is enabled
(doc)
lc signifies that lazy_compilation flag of model Config is activated
(doc)
strict means that parameter strict at model_config is turned on
(doc)
standard library function dataclasses.asdict was used
Notes about implementation:
asdict does not support renaming, produced dict contains the original field name
msgspec can not be built for pypy
pydantic requires using
jsonmode ofmodel_dumpmethod to produce json serializable dict (doc)
GitHub Issues (loading)¶
This benchmark examines libraries using real-world examples. It involves handling a slice of a CPython repository issues snapshot fetched via the GitHub REST API.
The library has to produce models from dict:
Processed models
The original endpoint returns an array of objects. Some libraries have no sane way to process a list of models,
so root level list wrapped with GetRepoIssuesResponse model.
These models represent most of the fields returned by the endpoint,
but some data are skipped.
For example, milestone is missed out, because the CPython repo does not use it.
from dataclasses import dataclass
from datetime import datetime
from enum import Enum
class IssueState(str, Enum):
OPEN = "open"
CLOSED = "closed"
class StateReason(str, Enum):
COMPLETED = "completed"
REOPENED = "reopened"
NOT_PLANNED = "not_planned"
class AuthorAssociation(str, Enum):
COLLABORATOR = "COLLABORATOR"
CONTRIBUTOR = "CONTRIBUTOR"
FIRST_TIMER = "FIRST_TIMER"
FIRST_TIME_CONTRIBUTOR = "FIRST_TIME_CONTRIBUTOR"
MANNEQUIN = "MANNEQUIN"
MEMBER = "MEMBER"
NONE = "NONE"
OWNER = "OWNER"
@dataclass
class SimpleUser:
login: str
id: int
node_id: str
avatar_url: str
gravatar_id: str | None
url: str
html_url: str
followers_url: str
following_url: str
gists_url: str
starred_url: str
subscriptions_url: str
organizations_url: str
repos_url: str
events_url: str
received_events_url: str
type: str
site_admin: bool
name: str | None = None
email: str | None = None
starred_at: datetime | None = None
@dataclass
class Label:
id: int
node_id: str
url: str
name: str
description: str | None
color: str
default: bool
@dataclass
class Reactions:
url: str
total_count: int
plus_one: int # renamed to '+1'
minus_one: int # renamed to '-1'
laugh: int
confused: int
heart: int
hooray: int
eyes: int
rocket: int
@dataclass
class PullRequest:
diff_url: str | None
html_url: str | None
patch_url: str | None
url: str | None
merged_at: datetime | None = None
@dataclass
class Issue:
id: int
node_id: str
url: str
repository_url: str
labels_url: str
comments_url: str
events_url: str
html_url: str
number: int
state: IssueState
state_reason: StateReason | None
title: str
user: SimpleUser | None
labels: list[Label]
assignee: SimpleUser | None
assignees: list[SimpleUser] | None
locked: bool
active_lock_reason: str | None
comments: int
closed_at: datetime | None
created_at: datetime | None
updated_at: datetime | None
author_association: AuthorAssociation
reactions: Reactions | None = None
pull_request: PullRequest | None = None
body_html: str | None = None
body_text: str | None = None
timeline_url: str | None = None
body: str | None = None
@dataclass
class GetRepoIssuesResponse:
data: list[Issue]
Cases description
Notes about implementation:
marshmallow can not create an instance of dataclass or another model, so,
@post_loadhook was used (doc)msgspec can not be built for pypy
pydantic strict mode accepts only enum instances for the enum field, so, it cannot be used at this benchmark (doc)
cattrs can not process datetime out of the box. Custom structure hook
lambda v, tp: datetime.fromisoformat(v)was used. This function does not generate a descriptive error, therefore production implementation could be slower.
GitHub Issues (dumping)¶
This benchmark examines libraries using real-world examples. It involves handling a slice of a CPython repository issues snapshot fetched via the GitHub REST API.
The library has to convert the model instance to dict used at loading benchmark:
Processed models
The original endpoint returns an array of objects. Some libraries have no sane way to process a list of models,
so root level list wrapped with GetRepoIssuesResponse model.
These models represent most of the fields returned by the endpoint,
but some data are skipped.
For example, milestone is missed out, because the CPython repo does not use it.
GitHub API distinct nullable fields and optional fields.
So, default values must be omitted at dumping,
but fields with type Optional[T] without default must always be presented
from dataclasses import dataclass
from datetime import datetime
from enum import Enum
class IssueState(str, Enum):
OPEN = "open"
CLOSED = "closed"
class StateReason(str, Enum):
COMPLETED = "completed"
REOPENED = "reopened"
NOT_PLANNED = "not_planned"
class AuthorAssociation(str, Enum):
COLLABORATOR = "COLLABORATOR"
CONTRIBUTOR = "CONTRIBUTOR"
FIRST_TIMER = "FIRST_TIMER"
FIRST_TIME_CONTRIBUTOR = "FIRST_TIME_CONTRIBUTOR"
MANNEQUIN = "MANNEQUIN"
MEMBER = "MEMBER"
NONE = "NONE"
OWNER = "OWNER"
@dataclass
class SimpleUser:
login: str
id: int
node_id: str
avatar_url: str
gravatar_id: str | None
url: str
html_url: str
followers_url: str
following_url: str
gists_url: str
starred_url: str
subscriptions_url: str
organizations_url: str
repos_url: str
events_url: str
received_events_url: str
type: str
site_admin: bool
name: str | None = None
email: str | None = None
starred_at: datetime | None = None
@dataclass
class Label:
id: int
node_id: str
url: str
name: str
description: str | None
color: str
default: bool
@dataclass
class Reactions:
url: str
total_count: int
plus_one: int # renamed to '+1'
minus_one: int # renamed to '-1'
laugh: int
confused: int
heart: int
hooray: int
eyes: int
rocket: int
@dataclass
class PullRequest:
diff_url: str | None
html_url: str | None
patch_url: str | None
url: str | None
merged_at: datetime | None = None
@dataclass
class Issue:
id: int
node_id: str
url: str
repository_url: str
labels_url: str
comments_url: str
events_url: str
html_url: str
number: int
state: IssueState
state_reason: StateReason | None
title: str
user: SimpleUser | None
labels: list[Label]
assignee: SimpleUser | None
assignees: list[SimpleUser] | None
locked: bool
active_lock_reason: str | None
comments: int
closed_at: datetime | None
created_at: datetime | None
updated_at: datetime | None
author_association: AuthorAssociation
reactions: Reactions | None = None
pull_request: PullRequest | None = None
body_html: str | None = None
body_text: str | None = None
timeline_url: str | None = None
body: str | None = None
@dataclass
class GetRepoIssuesResponse:
data: list[Issue]
Cases description
dt_all, dt_first and dt_disable expresses that debug_trail parameter of Retort
set to DebugTrail.ALL, DebugTrail.FIRST, DebugTrail.DISABLE
(doc)
no_gc points to that models have disabled gc option
(doc)
dv indicates that Converter option detailed_validation is enabled
(doc)
lc signifies that lazy_compilation flag of model Config is activated
(doc)
strict means that parameter strict at model_config is turned on
(doc)
standard library function dataclasses.asdict was used
Notes about implementation:
asdict does not support renaming, produced dict contains the original field name
msgspec can not be built for pypy
pydantic requires using
jsonmode ofmodel_dumpmethod to produce json serializable dict (doc)cattrs can not process datetime out of the box. Custom unstructure hook
datetime.isoformatwas used.marshmallow can not skip
Nonevalues for specific fields out of the box.@post_dumpis used to remove these fields.