Google's protocol buffers are widely used for RPC services. I managed to (rather extensively) patch the included Python library to support Python 3.3 – and in such a way that you can use Python 2.7 on the exact same source (all tests pass on both platforms).
I migrated the subversion repository to git, applied the patches and pushed it to github.com/malthe/google-protobuf. Please report any bugs to the issue tracker.
Note that some patches were adapted from Charles Law's python3-protobuf branch (also on git).
Migration techniques
Since Python 3.3, the u"..." literal has been made available as an alias for the regular "..." syntax. This makes it a lot more feasible to write a library that targets both Python 2.7 and 3.3 using a single codebase.
In certain situations – especially for module imports – you need to make a condition based on e.g. sys.version_info[0], but it's not much trouble. There's the six library that you can use and while it might help (and "standardize"), I didn't want the dependency.
I'm not sure I fully support the decision to make unicode the default (and for the most part only) string type in Python 3, but it's obviously here to stay and I think one important observation to make is that bytes coming in over the wire are still bytes – as in bytes. This actually means that quite a few libraries that use byte strings can be ported without much difficulty, because the buffer interface is still determined by the wire protocol (which isn't unicode – it's not even an encoding).
Minor difficulties
There are a couple of "gotchas". The ones I spent most time with were:
-
No
'string_escape'codec!I had to rewrite
s.decode('string_escape')into this:regex = re.compile(b'\\\\(\\\\|[0-7]{1,3}|x.[0-9a-f]?|[\'"abfnrt]|.|$)') def replace(m): b = m.group(1) if len(b) == 0: raise ValueError("Invalid character escape: '\\'.") i = b[0] if i == 120: v = int(b[1:], 16) elif 48 <= i <= 55: v = int(b, 8) elif i == 34: return b'"' elif i == 39: return b"'" elif i == 92: return b'\\' elif i == 97: return b'\a' elif i == 98: return b'\b' elif i == 102: return b'\f' elif i == 110: return b'\n' elif i == 114: return b'\r' elif i == 116: return b'\t' else: s = b.decode('ascii') raise UnicodeDecodeError( 'stringescape', text, m.start(), m.end(), "Invalid escape: %r" % s ) return bytes((v, )) result = regex.sub(replace, text) -
Indexing a byte string on Python 3 returns an integer!
The trick is to make a trivial slice:
s[x:x + 1] # Instead of s[x] -
There's no
chrfor byte strings!But you can use the
bytesconstructor creatively:bytes((x, )) # Instead of chr(x) -
The syntax for using meta-classes has changed.
I wrote a class decorator and had it applied to all the auto-generated classes that provide a
__metaclass__:if sys.version_info[0] == 3: def decorator(cls): return cls.__metaclass__( cls.__name__, cls.__bases__, cls.__dict__.copy() ) else: def decorator(cls): return clsThis meant that I had to make only very minor changes to the C++ code that generates Python.
Additional resources
The protobuf library is not the only implementation of protocol buffers. Josh Haberman wrote upb, the "small, low-level protocol buffer library":
The dynamic nature of upb is especially useful in the context of dynamic or interpreted languages. upb is specifically designed to be an ideal target for dynamic language extensions.
It's got Python 2.x-bindings, available through an extension library – and porting to Python 3.x should be feasible.
Comments