So you want to be a Python expert?

So you want to be a Python expert?

With the global pandemic resulting in a world-wide quarantine, we're seeing a lot interest peeking in Python as language. Social-media, developer blogs and other platforms are flooding with blogs, videos, podcasts, etc. related to either getting started with Python or upskilling yourself from being a beginner to becoming an intermediate / expert Python developer. With such an amount of ever-growing interest in the Python community, I figured this to be a good time to touch on some key concepts by bringing to everyone's attention one of the best PyData talks that I ever came across delivered by James Powell back in 2017. I believe it is important to revisit such masterpieces time and again to keep it alive and pass it on to all the curious cats out there. This blog post is account of taking a brief look at all the key takeaways from the talk.

What it takes to be good at Python?

Python originated as a scripting language. It's main purpose was such that developers would be able to write simple scripts in order to orchestrate some higher level languages such as C; or to patch together different constructs; and to quickly get things done. It has, however, since then, evolved into a full-fledged general-purpose programming language. Along it's way Python has grown to be opinionated to think about some core-concepts in it's own way in programming.

To be good at Python, one must have a good understanding of a couple of things that the language comes with, like: the built-in data types, built-in functions, a little-bit of understanding of what's available in the standard library, etc. Pretty basic stuff. However, to really become an 'expert' at Python, once must understand what is the 'next step' after this? What does it really take to be effective at Python rather than just being good at it?

Data-models

All the data in Python is represented as Objects. Multiple objects can have relationships between them. This is in conformance to Von Neumann's model of a "stored-program computer" in which code is also represented as objects. Python, at it's core, is unbelievably consistent. After working with Python for a while, it becomes naturally intuitive and you are able to start making well-informed guesses about features that are new to you.

Guido’s sense of the aesthetics of language design is amazing. I’ve met many fine language designers who could build theoretically beautiful languages that no one would ever use, but Guido is one of those rare people who can build a language that is just slightly less theoretically beautiful but thereby is a joy to write programs in.

  • Jim Hugunin, Creator of Jython, cocreator of AspectJ, architect of the .Net DLR

The idea of data models is that, by implementing the dunder / special / magic methods, our objects can behave like the built-in types thereby enabling expressive coding style that the community considers Pythonic. Python and Ruby are the same in this regard. Both of these languages empower their users with a rich metaobject protocol a.k.a magic methods that enables users to leverage the same tools that are available to the core developers. This is interestingly in contrast to a language like JavaScript. Objects in JavaScript do have features that are magic, but you cannot leverage them in user-defined objects. For example, before JavaScript 1.8.5, having read-only attributes in a user-defined object was not possible. This was in contrast to some built-in objects which always had read-only attributes. It was not until ECMAScript 5.1 came out in 2009 when users started having the ability to define read-only attributes for their user-defined objects. Having said that, the metaobject protocol of JavaScript is evolving, but historically it has been more limited that that of Python or Ruby. To put it simply, data-models are nothing but an API for core language constructs.

Decorators

At implementation level, Python decorators do not resemble the classic Decorator design pattern, but an analogy can be made. The Decorator design pattern allows behavior to be added to individual objects, dynamically, without affecting the behavior of other objects from the same class. Quoting directly from Design Patterns: Elements of Reusable Object-Oriented Software:

The decorator conforms to the interface of the component it decorates so that its presence is transparent to the component’s clients. The decorator forwards requests to the component and may perform additional actions (such as drawing a border) before or after forwarding. Transparency lets you nest decorators recursively, thereby allowing an unlimited number of added responsibilities.

In Python, the decorator function plays the role of the concrete Decorator subclass, and the inner function it returns is the decorator instance. The returned function wraps the function to be decorated, which is analogous to the component in the design pattern. The returned function is transparent because it conforms to the interface of the component by accepting the same arguments. It forwards calls to the component and may perform additional operations either before or after it. This also provides the ability to recursively add nested decorators enabling additional responsibilities.

Generators

Iteration is fundamental to data processing. And when scanning datasets that don’t fit in memory, we need a way to fetch the items lazily, that is, one at a time and on demand. This is what the Iterator pattern is about. Python does not have macros like Lisp, so abstracting away the Iterator pattern required changing the language: the yield keyword was added in Python 2.2 (2001). The yield keyword allows the construction of generators, which work as iterators.

Every generator is an iterator. Generators fully implement the iterator interface. But an iterator — as defined in the Gang of Four book — retrieves items from a collection, while a generator can produce items “out of thin air.” Python 3 uses generators in many places. Even the range() built-in now returns a generator-like object instead of a full-blown list like before.

Any Python function that has the yield keyword in its body is a generator function: a function which, when called, returns a generator object. In other words, a generator function is a generator factory. The only syntax distinguishing a plain function from a generator function is the fact that the latter has a yield keyword somewhere in its body. Some argued that a new keyword like gen should be used for generator functions instead of def , but Guido did not agree. His arguments are in PEP 255

A generator function builds a generator object that wraps the body of the function. When we invoke next(...) on the generator object, execution advances to the next yield in the function body, and the next(...) call evaluates to the value yielded when the function body is suspended. Finally, when the function body returns, the enclosing generator object raises StopIteration , in accordance with the Iterator protocol.

Context Managers

Context manager objects exist to control a with statement, just like iterators exist to control a for statement. The with statement was designed to simplify the try/finally pattern, which guarantees that some operation is performed after a block of code, even if the block is aborted because of an exception, a return or sys.exit() call. The code in the finally clause usually releases a critical resource or restores some previous state that was temporarily changed. The context manager protocol consists of the __enter__ and __exit__ methods. At the start of the with, __enter__ is invoked on the context manager object. The role of the finally clause is played by a call to __exit__ on the context manager object at the end of the with block.

The @contextmanager decorator reduces the boilerplate of creating a context manager. Instead of writing a whole class with __enter__/__exit__ methods, you just implement a generator with a single yield that should produce whatever you want the __enter__ method to return.

The Ultimate Example

The following example is straight-up picked from the PyData talk. James really uses this example well to connect all the dots in the end to show how to use all the above discussed concepts practically.

Let's say that we want to connect to a database, create a table, insert some entries, print those entries and finally, drop the table. The most basic way of doing it would be:

from sqlite3 import connect

with connect('test.db') as conn:

    cur = conn.cursor()

    cur.execute('create table points(x int, y int)')

    cur.execute('insert into points (x, y) values (1, 2)')
    cur.execute('insert into points (x, y) values (3, 4)')
    cur.execute('insert into points (x, y) values (5, 6)')

    for row in cur.execute('select x, y from points'):
        print(row)

    cur.execute('drop table points')

For the sake of convenience, let's assume that SQLite does not support transactions, and that we need to implement the basic setup and tear-down action of the database, irrespective of any errors that we may encounter in between. We can do this by implementing a custom context manager:

from sqlite3 import connect

class contextmanager:

    def __init__(self, cur):
        self.cur = cur

    def __enter__(self):
        self.cur.execute('create table points(x int, y int)')

    def __exit__(self, *args):
        self.cur.execute('drop table points')

with connect('test.db') as conn:

    cur = conn.cursor()

    with contextmanager(cur):

        cur.execute('insert into points (x, y) values (1, 2)')
        cur.execute('insert into points (x, y) values (3, 4)')
        cur.execute('insert into points (x, y) values (5, 6)')

        for row in cur.execute('select x, y from points'):
            print(row)

Here, we made use of Python's data-models to implement a custom context manager which sets up our database and drops it by the end. One thing to notice is to understand how these __enter__ and __exit__ methods are being called. By adding something as simple as print() statements with our methods, we can see that the __enter__ method is always called before the __exit__ method. This suggests a specific chronology, a specific sequence. We can enforce this sequence with the help of generator functions:

from sqlite3 import connect

class contextmanager:

    def __init__(self, cur):
        self.cur = cur

    def __enter__(self):
        self.gen = temptable(self.cur)
        next(self.gen)

    def __exit__(self, *args):
        next(self.gen, None)

def temptable(cur):
    cur.execute('create table points(x int, y int)')
    yield
    cur.execute('drop table points')

with connect('test.db') as conn:

    cur = conn.cursor()

    with contextmanager(cur):

        cur.execute('insert into points (x, y) values (1, 2)')
        cur.execute('insert into points (x, y) values (3, 4)')
        cur.execute('insert into points (x, y) values (5, 6)')

        for row in cur.execute('select x, y from points'):
            print(row)

The only potential problem with the above defined code-snippet is that it is not generic. We have hard-coded the generator function that is called from __enter__. We can make it generic by introducing another data-model method to make our program purely generic:

from sqlite3 import connect

class contextmanager:

    def __init__(self, gen):
        self.gen = gen

    def __call__(self, *args, **kwargs):
        self.args, self.kwargs = args, kwargs
        return self

    def __enter__(self):
        self.gen_inst = self.gen(*self.args, **self.kwargs)
        next(self.gen_inst)

    def __exit__(self, *args):
        next(self.gen_inst, None)

def temptable(cur):
    cur.execute('create table points(x int, y int)')
    yield
    cur.execute('drop table points')

with connect('test.db') as conn:

    cur = conn.cursor()

    with contextmanager(temptable)(cur):

        cur.execute('insert into points (x, y) values (1, 2)')
        cur.execute('insert into points (x, y) values (3, 4)')
        cur.execute('insert into points (x, y) values (5, 6)')

        for row in cur.execute('select x, y from points'):
            print(row)

This is much better. Except that our call of the custom context manager looks a bit cluttered. We can refactor it a bit, like:

from sqlite3 import connect

class contextmanager:

    def __init__(self, gen):
        self.gen = gen

    def __call__(self, *args, **kwargs):
        self.args, self.kwargs = args, kwargs
        return self

    def __enter__(self):
        self.gen_inst = self.gen(*self.args, **self.kwargs)
        next(self.gen_inst)

    def __exit__(self, *args):
        next(self.gen_inst, None)

def temptable(cur):
    cur.execute('create table points(x int, y int)')
    yield
    cur.execute('drop table points')

tmptable = contextmanager(temptable)

with connect('test.db') as conn:

    cur = conn.cursor()

    with tmptable(cur):

        cur.execute('insert into points (x, y) values (1, 2)')
        cur.execute('insert into points (x, y) values (3, 4)')
        cur.execute('insert into points (x, y) values (5, 6)')

        for row in cur.execute('select x, y from points'):
            print(row)

All that we did here, was to take the temptable() generator and wrap it around our context manager. Since we're talking about wrapping functions here, it should naturally and immediately remind us about decorators. We can have a nice little decorator around temptable(), like:

from sqlite3 import connect

class contextmanager:

    def __init__(self, gen):
        self.gen = gen

    def __call__(self, *args, **kwargs):
        self.args, self.kwargs = args, kwargs
        return self

    def __enter__(self):
        self.gen_inst = self.gen(*self.args, **self.kwargs)
        next(self.gen_inst)

    def __exit__(self, *args):
        next(self.gen_inst, None)

@contextmanager
def temptable(cur):
    cur.execute('create table points(x int, y int)')
    yield
    cur.execute('drop table points')

with connect('test.db') as conn:

    cur = conn.cursor()

    with temptable(cur):

        cur.execute('insert into points (x, y) values (1, 2)')
        cur.execute('insert into points (x, y) values (3, 4)')
        cur.execute('insert into points (x, y) values (5, 6)')

        for row in cur.execute('select x, y from points'):
            print(row)

There! Right here we have an example which leverages all of the core concepts of Python which are discussed in this blog post: data-models, generators, context managers and finally, decorators. Fundamentally, in this example, we have these four features implemented with a very clear conceptual meaning. A context manager is merely some piece of code that pairs set-up and tear-down actions. A generator is merely a particular form of syntax that allows us to enforce sequencing and interleaving. Finally, we take this generator object and wrap it dynamically around our context manager using decorators. And we do all of this, using data-models. We use all this core features, and implement them together to rite what can be more or less called as an expert-level Python code.

Final Thoughts

What we must take away from this is that expert-level code in Python, is not a code that uses every single feature. In fact, it is not even a code that uses these n-number of features in Python. It's actually a code that has a certain clarity to where and when a feature must be used. It is code that doesn't waste time of the person who's writing it neither of the person who's reading it. It is code that doesn't have a lot of additional mechanism associated with it. It doesn't have people creating their own protocols. It doesn't have people creating their own frameworks because the language itself provides the core tools to all the developers. One mere has to understand what those core pieces are, what they mean, what they need and how to assemble them. Of course, the syntax, argument sequencing, sequence of data-model dispatch, etc. do matter, but they all come secondary to the actual understanding of the core concepts. All that matters to achieve expert-level in Python is to remember what the core features are. The syntax for various features are bound to change. Many implementations will get fix under bug-fixes and enhancements. But these are some of the core features of Python which have been there from the very beginning and will continue to last no matter what. The core meaning behind these features is what is important and that is something that will guide us in writing expert-level Python code.

Did you find this article valuable?

Support Pratik Shivaraikar by becoming a sponsor. Any amount is appreciated!