[ZODB-Dev] Space used by IOBTrees
Tim Peters
tim@zope.com
Fri, 28 Feb 2003 13:13:18 -0500
[Andreas Jung]
> ...
> Another question: I had a closer look at the pickles itself using
> pickletools. The PCDATA parts of the XML document were stored inside
> the tree as unicode strings. Inside the disassembled pickle
> they were "marked" as BINUNICODE. What encoding is used to pickle
> unicode strings (looks like utf-8 rather when UCS-2)?
Yes, it's UTF-8. Note that pickletools.py is meant to be "executable
documentation": there's little about pickles you can't learn from reading
it. If you search the source file for BINUNICODE, you'll find this:
I(name='BINUNICODE',
code='X',
arg=unicodestring4,
stack_before=[],
stack_after=[pyunicode],
proto=1,
doc="""Push a Python Unicode string object.
There are two arguments: the first is a 4-byte little-endian
signed int giving the number of bytes in the string. The second is
that many bytes, and is the UTF-8 encoding of the Unicode string.
"""),
It took an enormous amount of time to reverse-engineer and document all this
stuff, so I'm keen that people know they don't have to do that from scratch
every time anymore <wink>.