Upgrading Binary(Set)Attribute¶
Warning
The behavior of BinaryAttribute
and
BinarySetAttribute
has changed in backwards-incompatible ways
as of the 6.0 release of PynamoDB.
To prevent data corruption, use legacy_encoding=True
for existing binary attributes.
Context¶
PynamoDB version 5 (and lower) had two bugs in the way they handled binary attributes, which were addressed in PynamoDB 6:
Top-level binary attributes (i.e. within a
Model
) were being Base64-encoded twice. For elements in BinarySetAttribute, each element was being encoded twice.This behavior was an oversight and resulted in larger item sizes and non-standard semantics.
For example…
The 4 bytes
CA FE F0 0D
should’ve been sent over the wire asyv7wDQ==
(single round of Base64 encoding). Server-side, they would’ve been decoded and stored as 4 bytes. Instead, they were put through an extra round of Base64 encoding, thus sendingeXY3d0RRPT0=
over the wire. Server-side, they decoded intoyv7wDQ==
(in bytes,79 76 37 77 44 51 3D 3D
) and stored as 8 bytes.Nested binary attributes (i.e. within a
MapAttribute
andListAttribute
) were being wrapped in an additional layer of Base64 encoding on every serialization roundtrip.Not only it prevented them from being deserialized correctly, but also the model would also grow in size exponentially until it hit the DynamoDB item limit of 400KB. For this reason we conclude that
BinaryAttribute
andBinarySetAttribute
were not used in practice within maps and lists before PynamoDB 6.0 and thus there is no practical reason you would wantlegacy_encoding=True
for them.
Guidance¶
Top-level binary attributes¶
In models existing at the time of an upgrade from PynamoDB 5 (or lower), use
legacy_encoding=True
.Note
In PynamoDB 6 we require this new parameter to be explicitly set to prevent inadvertent data corruption during upgrades. By setting it to
True
during an upgrade, the developer marks the attribute as pre-existing and thus requiring legacy handling.For example:
class SomeExistingModel(Model): - my_binary = BinaryAttribute() + my_binary = BinaryAttribute(legacy_encoding=True)
class SomeExistingModel(Model): - my_binary = BinarySetAttribute() + my_binary = BinarySetAttribute(legacy_encoding=True)
After the version upgrade is complete, you can consider adding a new binary attribute and migrating the data.
In new models, use
legacy_encoding=False
.class NewModel(Model): my_binary = BinaryAttribute(legacy_encoding=False) my_binary_set = BinarySetAttribute(legacy_encoding=False)
Nested binary attributes¶
In maps, use
legacy_encoding=False
.class MyMap(MapAttribute): binary = BinaryAttribute(legacy_encoding=False) binary_set = BinarySetAttribute(legacy_encoding=False)
In raw maps, normal (non-legacy) encoding will be used.
class MyModel(Model): my_raw_map = MapAttribute() my_model = MyModel() my_model.my_raw_map = MapAttribute(binary=b'foo')
In lists, normal (non-legacy) encoding will be used.
This applies to both
ListAttribute(of=BinaryAttribute)
andof=BinarySetAttribute
as well as whenof=...
is not specified (forbytes
andSet[bytes]
elements).For example:
class MyModel(Model): binary_list = ListAttribute(of=BinaryAttribute) binary_set_list = ListAttribute(of=BinarySetAttribute) mixed_list = ListAttribute() model = MyModel() model.binary_list = [b'\xCA', b'\xFE'] model.binary_set_list = [{b'\xCA', b'\xFE'}, {b'\xF0', b'\x0D'}] model.mixed_list = [ b'\xCA\xFE', {b'\xF0', b'\x0D'}, ]
Migrating¶
Since PynamoDB 6 is compatible with existing data through legacy_encoding=True
, you do not need
to migrate data during an upgrade. Whether you want to migrate data depends on your use case.
Advantages include smaller item sizes and more standardized serialization. However, for large tables,
there might be significant cost and engineering complexity involved.
Warning
Be sure to have an up-to-date backup of your data.
These are the typical steps to migrate an attribute:
Double-write to both the old and new attribute. Read from the new, falling back to the old.
class SomeExistingModel(Model): _my_binary_v1 = BinaryAttribute(legacy_encoding=True, attr_name='my_binary') _my_binary_v2 = BinaryAttribute(legacy_encoding=False, attr_name='my_binary_v2') @property def my_binary() -> bytes: return self._my_binary_v1 if self._my_binary_v2 is None else self._my_binary_v2 @my_binary.setter def my_binary(value: bytes) -> None: self._my_binary_v1 = value self._my_binary_v2 = value def save(self, *args, **kwargs): self.my_binary_v2 = self._my_binary_v1 return super().save(*args, **kwargs)
Change the old attribute to be optional:
class SomeExistingModel(Model): - _my_binary_v1 = BinaryAttribute(legacy_encoding=True, attr_name='my_binary') + _my_binary_v1 = BinaryAttribute(legacy_encoding=True, attr_name='my_binary', null=True)
and rather than double-write to it, unset it by assigning
None
:@my_binary.setter def my_binary(value: bytes) -> None: - self._my_binary_v1 = value + self._my_binary_v1 = None self._my_binary_v2 = value def save(self, *args, **kwargs): - self.my_binary_v2 = self._my_binary_v1 + if self._my_binary_v1 is not None: + self.my_binary_v2 = self._my_binary_v1 + self._my_binary_v1 = None return super().save(*args, **kwargs)
At this point, you can either let natural migration run its course (as your online system re-saves models), or you can perform a one-time migration by scanning the table and re-saving each item.
Once migration is done, remove the old attribute and all migration logic.
class SomeExistingModel(Model): my_binary = BinaryAttribute(legacy_encoding=False, attr_name='my_binary_v2')