Benchmarks#
Measure principles#
These benchmarks aim to make a complete, fair, and reliable comparison between different libraries among different versions of Python.
If you find a mistake in benchmarking methods or you want to add another library to the comparison create a new issue.
All benchmarks are made via pyperf – an advanced library used to measure the performance of Python interpreters. It takes care of calibration, warming up, and gauging.
To handle a vast number of benchmarks variations and make pyperf API more convenient
new internal framework was created. It adds no overhead and is intended only to orchestrate pyperf
runs.
All measurements exclude the time required to initialize and generate the conversion function.
Each library is tested with different options that may affect performance.
All benchmarks listed below were produced with libraries:
Benchmarks analysis#
Important
Serializing and deserializing libraries have a lot of options that customize the conversion process. These parameters may greatly affect performance but there is no way to create benchmarks for each combination of these options. So, performance for your specific case may be different.
Simple Structures (loading)#
This benchmark examines the loading of basic structures natively supported by all the libraries.
The library has to produce models from dict:
from dataclasses import dataclass
from typing import List
@dataclass
class Review:
id: int
title: str
rating: float
content: str # renamed to 'text'
@dataclass
class Book:
id: int
name: str
reviews: List[Review] # contains 100 items
Cases description
dv
indicates that Converter
option detailed_validation
is enabled
(doc)
dp
denotes that parameter debug_path
of Factory
is set to True
(doc)
lc
signifies that lazy_compilation
flag of model Config
is activated
(doc)
strict
means that parameter strict
at model_config
is turned on
(doc)
Notes about implementation:
marshmallow can not create an instance of dataclass or another model, so,
@post_load
hook was used (doc)msgspec can not be built for pypy
Simple Structures (dumping)#
This benchmark studies the dumping of basic structures natively supported by all the libraries.
The library has to convert the model instance to dict used at loading benchmark:
from dataclasses import dataclass
from typing import List
@dataclass
class Review:
id: int
title: str
rating: float
content: str # renamed to 'text'
@dataclass
class Book:
id: int
name: str
reviews: List[Review] # contains 100 items
Cases description
dt_all
, dt_first
and dt_disable
expresses that debug_trail
parameter of Retort
set to DebugTrail.ALL
, DebugTrail.FIRST
, DebugTrail.DISABLE
(doc)
no_gc
points to that models have disabled gc
option
(doc)
dv
indicates that Converter
option detailed_validation
is enabled
(doc)
lc
signifies that lazy_compilation
flag of model Config
is activated
(doc)
strict
means that parameter strict
at model_config
is turned on
(doc)
standard library function dataclasses.asdict
was used
Notes about implementation:
asdict does not support renaming, produced dict contains the original field name
msgspec can not be built for pypy
pydantic requires using
json
mode ofmodel_dump
method to produce json serializable dict (doc)
GitHub Issues (loading)#
This benchmark examines libraries using real-world examples. It involves handling a slice of a CPython repository issues snapshot fetched via the GitHub REST API.
The library has to produce models from dict:
Processed models
The original endpoint returns an array of objects. Some libraries have no sane way to process a list of models,
so root level list wrapped with GetRepoIssuesResponse
model.
These models represent most of the fields returned by the endpoint,
but some data are skipped.
For example, milestone
is missed out, because the CPython repo does not use it.
from dataclasses import dataclass
from datetime import datetime
from enum import Enum
from typing import List, Optional
class IssueState(str, Enum):
OPEN = "open"
CLOSED = "closed"
class StateReason(str, Enum):
COMPLETED = "completed"
REOPENED = "reopened"
NOT_PLANNED = "not_planned"
class AuthorAssociation(str, Enum):
COLLABORATOR = "COLLABORATOR"
CONTRIBUTOR = "CONTRIBUTOR"
FIRST_TIMER = "FIRST_TIMER"
FIRST_TIME_CONTRIBUTOR = "FIRST_TIME_CONTRIBUTOR"
MANNEQUIN = "MANNEQUIN"
MEMBER = "MEMBER"
NONE = "NONE"
OWNER = "OWNER"
@dataclass
class SimpleUser:
login: str
id: int
node_id: str
avatar_url: str
gravatar_id: Optional[str]
url: str
html_url: str
followers_url: str
following_url: str
gists_url: str
starred_url: str
subscriptions_url: str
organizations_url: str
repos_url: str
events_url: str
received_events_url: str
type: str
site_admin: bool
name: Optional[str] = None
email: Optional[str] = None
starred_at: Optional[datetime] = None
@dataclass
class Label:
id: int
node_id: str
url: str
name: str
description: Optional[str]
color: str
default: bool
@dataclass
class Reactions:
url: str
total_count: int
plus_one: int # renamed to '+1'
minus_one: int # renamed to '-1'
laugh: int
confused: int
heart: int
hooray: int
eyes: int
rocket: int
@dataclass
class PullRequest:
diff_url: Optional[str]
html_url: Optional[str]
patch_url: Optional[str]
url: Optional[str]
merged_at: Optional[datetime] = None
@dataclass
class Issue:
id: int
node_id: str
url: str
repository_url: str
labels_url: str
comments_url: str
events_url: str
html_url: str
number: int
state: IssueState
state_reason: Optional[StateReason]
title: str
user: Optional[SimpleUser]
labels: List[Label]
assignee: Optional[SimpleUser]
assignees: Optional[List[SimpleUser]]
locked: bool
active_lock_reason: Optional[str]
comments: int
closed_at: Optional[datetime]
created_at: Optional[datetime]
updated_at: Optional[datetime]
author_association: AuthorAssociation
reactions: Optional[Reactions] = None
pull_request: Optional[PullRequest] = None
body_html: Optional[str] = None
body_text: Optional[str] = None
timeline_url: Optional[str] = None
body: Optional[str] = None
@dataclass
class GetRepoIssuesResponse:
data: List[Issue]
Cases description
Notes about implementation:
marshmallow can not create an instance of dataclass or another model, so,
@post_load
hook was used (doc)msgspec can not be built for pypy
pydantic strict mode accepts only enum instances for the enum field, so, it cannot be used at this benchmark (doc)
cattrs can not process datetime out of the box. Custom structure hook
lambda v, tp: datetime.fromisoformat(v)
was used. This function does not generate a descriptive error, therefore production implementation could be slower.
GitHub Issues (dumping)#
This benchmark examines libraries using real-world examples. It involves handling a slice of a CPython repository issues snapshot fetched via the GitHub REST API.
The library has to convert the model instance to dict used at loading benchmark:
Processed models
The original endpoint returns an array of objects. Some libraries have no sane way to process a list of models,
so root level list wrapped with GetRepoIssuesResponse
model.
These models represent most of the fields returned by the endpoint,
but some data are skipped.
For example, milestone
is missed out, because the CPython repo does not use it.
GitHub API distinct nullable fields and optional fields.
So, default values must be omitted at dumping,
but fields with type Optional[T]
without default must always be presented
from dataclasses import dataclass
from datetime import datetime
from enum import Enum
from typing import List, Optional
class IssueState(str, Enum):
OPEN = "open"
CLOSED = "closed"
class StateReason(str, Enum):
COMPLETED = "completed"
REOPENED = "reopened"
NOT_PLANNED = "not_planned"
class AuthorAssociation(str, Enum):
COLLABORATOR = "COLLABORATOR"
CONTRIBUTOR = "CONTRIBUTOR"
FIRST_TIMER = "FIRST_TIMER"
FIRST_TIME_CONTRIBUTOR = "FIRST_TIME_CONTRIBUTOR"
MANNEQUIN = "MANNEQUIN"
MEMBER = "MEMBER"
NONE = "NONE"
OWNER = "OWNER"
@dataclass
class SimpleUser:
login: str
id: int
node_id: str
avatar_url: str
gravatar_id: Optional[str]
url: str
html_url: str
followers_url: str
following_url: str
gists_url: str
starred_url: str
subscriptions_url: str
organizations_url: str
repos_url: str
events_url: str
received_events_url: str
type: str
site_admin: bool
name: Optional[str] = None
email: Optional[str] = None
starred_at: Optional[datetime] = None
@dataclass
class Label:
id: int
node_id: str
url: str
name: str
description: Optional[str]
color: str
default: bool
@dataclass
class Reactions:
url: str
total_count: int
plus_one: int # renamed to '+1'
minus_one: int # renamed to '-1'
laugh: int
confused: int
heart: int
hooray: int
eyes: int
rocket: int
@dataclass
class PullRequest:
diff_url: Optional[str]
html_url: Optional[str]
patch_url: Optional[str]
url: Optional[str]
merged_at: Optional[datetime] = None
@dataclass
class Issue:
id: int
node_id: str
url: str
repository_url: str
labels_url: str
comments_url: str
events_url: str
html_url: str
number: int
state: IssueState
state_reason: Optional[StateReason]
title: str
user: Optional[SimpleUser]
labels: List[Label]
assignee: Optional[SimpleUser]
assignees: Optional[List[SimpleUser]]
locked: bool
active_lock_reason: Optional[str]
comments: int
closed_at: Optional[datetime]
created_at: Optional[datetime]
updated_at: Optional[datetime]
author_association: AuthorAssociation
reactions: Optional[Reactions] = None
pull_request: Optional[PullRequest] = None
body_html: Optional[str] = None
body_text: Optional[str] = None
timeline_url: Optional[str] = None
body: Optional[str] = None
@dataclass
class GetRepoIssuesResponse:
data: List[Issue]
Cases description
dt_all
, dt_first
and dt_disable
expresses that debug_trail
parameter of Retort
set to DebugTrail.ALL
, DebugTrail.FIRST
, DebugTrail.DISABLE
(doc)
no_gc
points to that models have disabled gc
option
(doc)
dv
indicates that Converter
option detailed_validation
is enabled
(doc)
lc
signifies that lazy_compilation
flag of model Config
is activated
(doc)
strict
means that parameter strict
at model_config
is turned on
(doc)
standard library function dataclasses.asdict
was used
Notes about implementation:
asdict does not support renaming, produced dict contains the original field name
msgspec can not be built for pypy
pydantic requires using
json
mode ofmodel_dump
method to produce json serializable dict (doc)cattrs can not process datetime out of the box. Custom unstructure hook
datetime.isoformat
was used.marshmallow can not skip
None
values for specific fields out of the box.@post_dump
is used to remove these fields.