Prevent Idle In Transaction

Avoiding idle-in-transaction connection states with SQLAlchemy

Gorgias Engineering
Oct 4, 2023

Table of contents

Prevent Idle In Transaction

Avoiding idle-in-transaction connection states with SQLAlchemy

Gorgias Engineering
Oct 4, 2023

As you work with SQLAlchemy, over time, you might have a performance nightmare brewing in the background that you aren’t even aware of.

In this lesser-known issue, which strikes primarily in larger projects, normal usage leads to an ever-growing number of idle-in-transaction database connections. These open connections can kill the overall performance of the application.

While you can fix this issue down the line, when it begins to take a toll on your performance, it takes much less work to mitigate the problem from the start.

At Gorgias, we learned this lesson the hard way. After testing different approaches, we solved the problem by extending the high-level SQLAlchemy classes (namely sessions and transactions) with functionality that allows working with "live" DB (database) objects for limited periods of time, expunging them after they are no longer needed.

This analysis covers everything you need to know to close those unnecessary open DB connections and keep your application humming along.

The problem: your database connection states are monopolizing unnecessary resources

Leading Python web frameworks such as Django come with an integrated ORM (object-relational mapping) that handles all database access, separating most of the low-level database concerns from the actual user code. The developer can write their code focusing on the actual logic around models, rather than thinking of the DB engine, transaction management or isolation level.

While this scenario seems enticing, big frameworks like Django may not always be suitable for our projects. What happens if we want to build our own starting from a microframework (instead of a full-stack framework) and augment it only with the components that we need?

In Python, the extra packages we would use to build ourselves a full-fledged framework are fairly standard: They will most likely include Jinja2 for template rendering, Marshmallow for dealing with schemas and SQLAlchemy as ORM.

Request-response paradigm vs. background tasks

Not all projects are web applications (following a request-response pattern) and among web applications, most of them deal with background tasks that have nothing to do with requests or responses.

This is important to understand because in request-response paradigms, we usually open a DB transaction upon receiving a request and we close it when responding to it. This allows us to associate the number of concurrent DB transactions with the number of parallel HTTP requests handled. A transaction stays open for as long as a request is being processed, and that must happen relatively quickly — users don't appreciate long loading times.

Transactions opened and closed by background tasks are a totally different story: There's no clear and simple rule on how DB transactions are managed at a code level, there's no easy way to tell how long tasks (should) last, and there usually isn't any upper limit to the execution time.

This could lead to potentially long transaction times, during which the process effectively holds a DB connection open without actually using it for the majority of the time period. This state is known as an idle-in-transaction connection state and should be avoided as much as possible, because it blocks DB resources without actively using them.

The limitations of SQLAlchemy with PEP-249

To fully understand how database access transpires in a SQLAlchemy-based app, one needs to understand the layers responsible for the execution.

Layers of execution in an SQLAlchemy app

At the highest level, we code our DB interaction using high-level SQLAlchemy queries on our defined models. The query is then transformed into one or more SQL statements by SQLAlchemy's ORM which is passed on to a database engine (driver) through a common Python DB API defined by PEP-249. (PEP-249 is a Python Enhancement Proposal dedicated to standardizing Python DB server access.) The database engine communicates with the actual database server.

At first glance, everything looks good in this stack. However there's one tiny problem: The DB API (defined by PEP-249) does not provide an explicit way of managing transactions. In fact, it mandates the use of a default transaction regardless of the operations you're executing, so even the simplest select will open a transaction if none are open on the current connection.

SQLAlchemy builds on top of PEP-249, doing its best to stay out of driver implementation details. That way, any Python DB driver claiming PEP-249 compatibility could work well with it.

While this is generally a good idea, SQLAlchemy has no choice but to inherit the limitations and design choices made at the PEP-249 level. More precisely (and importantly), it will automatically open a transaction for you upon the very first query, regardless whether it’s needed. And that's the root of the issue we set out to solve: In production, you'll probably end up with a lot of unwanted transactions, locking up on DB resources for longer than desired.

Also, SQLAlchemy uses sessions (in-memory caches of models) that rely on transactions. And the whole SQLAlchemy world is built around sessions. While you could technically ditch them to avoid the idle-in-transactions problem with a “lower-level” interface to the DB, all of the examples and documentation you’ll find online uses the “higher-level” interface (i.e. sessions). It’s likely that you will feel like you are trying to swim against the tide to get that workaround up and running.

Postgres and the different types of autocommits

Some DB servers, most notably Postgres, default to an autocommit mode. This mode implies atomicity at the SQL statement level — something developers are likely to expect. But they prefer to explicitly open a transaction block when needed and operate outside of one by default.

If you're reading this, you have probably already Googled for "sqlalchemy autocommit" and may have found their official documentation on the (now deprecated) autocommit mode. Unfortunately this functionality is a "soft" autocommit and is implemented purely in SQLAlchemy, on top of the PEP-249 driver; it doesn't have anything to do with DB's native autocommit mode.

This version works by simply committing the opened transaction as soon as SQLAlchemy detects an SQL statement that modifies data. Unfortunately, that doesn't fix our problem; the pointless, underlying DB transaction opened by non-modifying queries still remains open.

When using Postgres, we could in theory play with the new AUTOCOMMIT isolation level option introduced in psycopg2 to make use of the DB-level autocommit mode. However this is far from ideal as it would require hooking into SQLAlchemy's transaction management and adjusting the isolation level each time as needed. Additionally, "autocommit" isn't really an isolation level and it’s not desirable to change the connection's isolation level all the time, from various parts of the code. You can find more details on this matter, along with a possible implementation of this idea in Carl Meyer's article “PostgreSQL Transactions and SQLAlchemy.”

At Gorgias, we always prefer explicit solutions to implicit assumptions. By including all details, even common ones that most developers would assume by default, we can be more clear and leave less guesswork later on. This is why we didn't want to hack together a solution behind the scenes, just to get rid of our idle-in-transactions problem. We decided to dig deeper and come up with a proper, explicit, and (almost) hack-free method to fix it.

Visualizing an idle-in-transaction case

The following chart shows the profile of an idle-in-transaction case over a period of two weeks, before and after fixing the problem.

Visualizing idle-in-transaction, before and after

As you can see, we’re talking about tens of seconds during which connections are being held in an unusable state. In the context of a user waiting for a page to load, that is an excruciatingly long period of time.

The solution: expunged objects and frozen models

Expunging objects to prevent long-lasting idle connections

SQLAlchemy works with sessions that are, simply put, in-memory caches of model instances. The code behind these sessions is quite complex, but usage boils down to either explicit session reference...

...or implicit usage.

Both of these approaches will ensure a transaction is opened and will not close it until a later ***session.commit()***or session.rollback(). There's actually nothing wrong with calling session.commit() when you need to explicitly close a transaction that you know is opened and you’re done with using the DB, in that particular scope.

To address the idle-in-transaction problem generated by such a line, we must keep the code between the query and the commit relatively short and fast (i.e. avoid blocking calls or CPU-intensive operations).

It sounds simple enough, but what happens if we access an attribute of a DB model after session.commit()? It will open another transaction and leave it hanging, even though it might not need to hit the DB at all.

While we can't foresee what a developer will do with the DB object afterward, we can prevent usage that would hit the DB (and open a new transaction) by expunging it from the session. An expunged object will raise an exception if any unloaded (or expired) attributes are accessed. And that’s what we actually want here: to make it crash if misused, rather than leaving idle-in-transaction connections behind to block DB resources.

Building an expunging context manager to handle transactions and connections

When working with multiple objects and complex queries, it’s easy to overlook the necessary expunging of those objects. It only takes one un-expunged object to trigger the idle-in-transaction problem, so you need to be consistent.

Objects can't be used for any kind of DB interaction after being expunged. So how do we make it clear and obvious that certain objects are to be used in within a limited scope? The answer is a Python context manager to handle SQLAlchemy transactions and connections. Not only does it allow us to visually limit object usage to a block, but it will also ensure everything is prepared for us and cleaned up afterwards.

The construct above normally opens a transaction block associated to a new SQLAlchemy session, but we've added a new expunge keyword to the begin method, instructing SQLAlchemy to automatically expunge objects associated with block's session (the tx.session). To get this kind of behavior from a session, we need to override the begin method (and friends) in a subclass of SQLAlchemy's Session.

We want to keep the default behavior and use a new ExpungingTransaction instead of SQLAlchemy's SessionTransaction, but only when explicitly instructed to by the expunge=True argument.

You can use the class_ argument of sessionmaker to instruct it to build am ExpungingSession instead of a regular Session.

The last piece of the puzzle is the ExpungingTransaction code, which is responsible for two important things: committing the session so the underlying transaction gets closed and expunging objects so that we don't accidentally reopen the transaction.

By following these steps, you get a useful context manager that forces you to group your DB interaction into a block and notifies you if you mistakenly use (unloaded) objects outside of it.

Using frozen models to deal with expunged objects

What if we really need to access DB models outside of an expunging context?

Simply passing models to functions as arguments helps in achieving a great goal: the decoupling of models retrieval from their actual usage. However, such functions are no longer in control of what happens to those models afterwards

We don't want to forbid all usage of models outside of this context, but we need to somehow inform the user that the model object comes “as is,” with whatever loaded attributes it has. It's disconnected from the DB and shouldn't be modified.

In SQLAlchemy, when we modify a live model object, we expect the change to be pushed to the DB as soon as commit or flush is called on the owning session. With expunged objects this is not the case, because they don't belong to a session. So how does the user of such an object know what to expect from a certain model object? The user needs to ensure that she:

  • Doesn't access an unloaded attribute of a live DB object, as it may open an unwanted transaction
  • Doesn't modify attributes of an expunged object, as it won't be saved

To safely and explicitly pass along these kind of model objects, we introduced frozen objects. Frozen objects are basically proxies to expunged models that won't allow any modification.

To work with these frozen objects, we added a freeze method to our ExpungingSession:

So now our code would look something like this:

Now, what if we want to modify the object outside of this context, later on, (e.g. after a long-lasting HTTP request)? As our frozen object is completely disconnected from any session (and from the DB), we need to fetch a warm instance associated to it from the DB and make our changes to that instance. This is done by adding a helper fetch_warm_instance method to our session...

...and then our code that modifies the object would say something like this.

When the second context manager exits, it will call commit on tx.session, and changes to my_model will be committed to the DB right away.

Frozen Relationships

We now have a way of safely dealing with models without generating idle-in-transaction problems, but the code quickly becomes a mess if we have to deal with relationships: We need to freeze them separately and pass them along as if they aren’t related. This could be overcome by telling the freeze method to freeze all related objects, recursively walking the relationships.

We'll have to make some adjustments to our frozen proxy class as well.

Now, we can fetch, freeze, and use frozen objects with any preloaded relationships.

Additional recommendations and caveats

  • Don't call session.commit() inside an expunging context manager's block. In fact, avoid using session at all and use tx.session instead. The context manager will take care of flushing and committing the session when exited.
  • Avoid nested sessions inside the context block.
  • Try to use one single query inside a context manager. If you need multiple queries, it often makes sense to use separate context blocks for each one.
  • If you don't need to pass along an entire model object, you don't need to freeze it. Imagine that you only need an object's id or name attribute; you can simply store it in a variable while inside the expunging context block.

Avoid idle-in-transaction connection states to preserve DB resources

While the code to access the DB with SQLAlchemy may look simple and straightforward, one should always pay close attention to transaction management and the subtleties that arise from the various layers of the persistence stack.

We learned this the hard way, when our services eventually started to exhaust the DB resources many years into development.

If you recently decided to use a software stack similar to ours, you should consider writing your DB access code in such a way that it avoids idle-in-transaction issues, even from the first days of your project. The problem may not be obvious at the beginning, but it becomes painfully apparent as you scale.

If your project is mature and has been in development for years, you should  consider planning changes to your code to avoid or to minimize idle-in-transaction issues, while the situation is still under control. You can start writing new idle-in-transaction-proof code while planning to gradually update existing code, according to the capacity of your development team.


Gorgias Engineering
Author Profile
Gorgias Engineering
Author Profile

Table of contents