pushing Python to its limits

The following post is about a challenge that we encountered at work while extending an in-house-built program that provides automation for 14,000 network devices in our infrastructure.

Although we will cover synchronous, asynchronous and threading methods of executing code, the intention of this post is not to go into details of how these methods operate under the hood and rather how they behave in the real world in I/O bound scenarios.

I will try to provide as much detail as possible without compromising my NDA. To that end, some code snippets and details are slightly modified without altering the context.

The Scenario

Our network infrastructure is split into three main regions: NASA, EMEA, and APAC, with a presence in almost every country and a total of a little over 14,000 network devices such as switches, routers, firewalls, etc. New sites and devices are constantly added (green and brownfield), expanded, refreshed, or removed around the clock.

We have a very large number of wired and wireless IT, IoT, and OT devices with different functionalities spread across the globe, keeping our factories running. Since these devices require network connectivity to operate, their onboarding and decommissioning should happen quickly in a secure and standard way without compromising the integrity of our network as a whole.

Additionally, business users should be able to onboard and decommission these devices based on their functionality without opening tickets or the involvement of network engineers.

This has three benefits:

  • Less overhead for the NetOps team
  • Business being in charge of their devices
  • And a much faster process while the network remains conformant to our standards

This process is made possible by pulling and pushing configuration to network devices and Cloud services at different layers. For instance, for a single node with functionality X to gain access to the network, eight systems have to interact, be configured, and be updated.

That is why we need a proper and automated way to manage our fleet in unison.

The Program

We have developed a program called the EDO (Endpoint Device Onboarding) tool with its backend written in Python to help us onboard and decommission over a million (and counting) IT, IoT, and OT devices into our network in a secure, standard, and audited way, regardless of the site and country an endpoint is located in.

The program is comprised of two main branches or threads:

  1. The main thread that uvicorn (an ASGI web server) uses to serve the FastAPI application, acting as an API server that talks to many different systems, making the onboarding/decommissioning happen.
  2. The site processing thread that fetches the ever-growing network inventory every 2 hours from a global inventory, checks them for NETCONF support (specific capabilities), and if successful, creates a JSON hierarchy of countries with their sites and devices to be consumed by other parts of the program and the frontend later on.

The JSON hierarchy loosely looks like this:

{
  "Argentina": {
     "ARG001": {
         "DEVICE001": "1.2.3.4",
         "DEVICE100": "4.3.2.1",
      }
   },
   "Belgium": {
     "BEL001": {
         "DEVICE001": "5.6.7.8",
         "DEVICE500": "8.7.6.5",
      },
     "BEL002": {
         "DEVICE001": "1.3.5.7",
         "DEVICE200": "7.5.3.1",
      }
    },
    {
     ...
    },
    ...
}

An inventory refresh every two hours hits the sweet spot since NetOps and the business users work around the clock and things change rapidly. This way, we are almost always guaranteed to have a “live” view of the network infrastructure in the tool.

Up until recently, the site processing logic was not required. But the business needs change, and the applications follow.

Dealing with so many network devices in a fast and efficient manner is challenging, so we decided to do our initial tests on a batch of 1,000 devices to get a rough idea of the beast that we would be slaying.

The development and testing of this program are done on WSL using Python 3.11.3 in a venv, but in production, we use Docker containers to run it in a distributed fashion.

Let’s explore some of the methods we can use to achieve this goal.

The Synchronous Implementation

The first or naive implementation is to fetch all network devices from the global inventory, connect to each device in each site and see if they support the NETCONF protocol one after another.

Searching NETCONF and Python on Google yields the famous ncclient package. A very bare-bones version of what we wrote is as follows:

from ncclient import manager


def check_netconf_support(sites: list[dict]) -> list[dict]:
    def check_connectivity(site: dict) -> dict:
        caption = site["Caption"]
        ip_address = site["IPAddress"]

        try:
            logger.info("Checking NETCONF connectivity on %s: %s", caption, ip_address)

            conn = manager.connect(
                host=ip_address,
                username=USER,
                password=PASSWORD,
                hostkey_verify=False,
                timeout=6,
            )
            _ = conn.server_capabilities

            logger.info("NETCONF is supported on %s: %s", caption, ip_address)

            conn.close_session()
            site["NetconfSupport"] = True
        except Exception as err:
            logger.error("Failed connecting to %s: %s", ip_address, err)
            site["NetconfSupport"] = False

        return site

    return [check_connectivity(site) for site in sites]

Here, we iterate through the sites, open a NETCONF session to each device, invoke the server_capabilities property, and close the connection. Finally, we add a new field, NetconfSupport, to our sites data structure for further processing.

Getting the server (network device) NETCONF capabilities is important to verify if it actually has the required models for our operations.

On a server with a good network connection, this takes around 30 minutes to complete for 1,000 devices. So, for 14k, devices we are looking at a bit over five hours to complete the task.

Keep in mind that our network devices are spread across the world. These timings will fluctuate a bit based on the connection quality of our links and the internet at any given time.

This method that we just employed is called synchronous programming. We interact with one system, wait for it to respond, process the response and then move onto the next one. During the time that we are waiting on I/O (which is the majority of the time), our program is just idling. Obviously, this is too slow and resource-wasteful.

This is the reality of I/O bound programs. Our program’s speed is constrained by the speed limits of the network and devices it is talking to rather than being CPU bound or limited by the computation capacity of our processor.

Thankfully, there is a better way.

The Asynchronous Implementation

Since this section of our program is purely I/O bound, we can rewrite that part as an async or asynchronous way.

At the time of this writing, ncclient does not ship with async support, so we will use the scrapli_netconf library that enables us to implement the same logic as the initial implementation but in an async fashion.

Here is the revised version:

import asyncio

from scrapli_netconf import AsyncNetconfDriver


async def check_netconf_support(sites: list[dict]) -> list[dict]:
    async def check_connectivity(site: dict) -> dict:
        caption = site["Caption"]
        ip_address = site["IPAddress"]

        try:
            logger.info("Checking NETCONF connectivity on %s: %s", caption, ip_address)

            conn = AsyncNetconfDriver(
                host=ip_address,
                auth_username=USER,
                auth_password=PASSWORD,
                auth_strict_key=False,
                transport="asyncssh",
                timeout_socket=10,
                timeout_transport=10,
                timeout_ops=10,
            )
            await conn.open()

            logger.info("Successfully connected to %s: %s", caption, ip_address)

            site["NetconfSupport"] = True
        except Exception as err:
            logger.error("Failed connecting to %s: %s", ip_address, err)
            site["NetconfSupport"] = False
        finally:
            await conn.close()

        return site

    tasks = [asyncio.create_task(check_connectivity(site)) for site in sites]

    return await asyncio.gather(*tasks)

NOTE: when using the AsyncNetconfDriver, we should set the transport method to asyncssh using the transport option. asyncssh is a Python package that could be installed using pip install asyncssh.

The main logic is very similar to the previous approach with a few differences that are caused by using a different library.

We connect to every network device and check if they support NETCONF by calling the open method, which in turn calls the _get_server_capabilities method for us, then we close the connection and finally, add a new field, NetconfSupport, to our sites data structure for further processing.

There are a bunch of additions, the use of asyncio from Python’s standard library and two new keywords:

  • The async keyword behind the function definitions, essentially converts them to coroutines
  • The await keyword that as its name suggests, waits for a result and lets the runtime to do other things while the computation finishes
  • Creation of Tasks using the create_task function
  • Use of the gather function to schedule coroutines execution, and wait for their completion.

A Task is similar to a Thread but it will be scheduled by the programming language instead of the operating system. Tasks are lighter and more efficient than Threads.

You might be wondering what is the benefit of this approach if the main logic is similar to the previous one.

Well, in the async version, we first create Tasks (check_connectivity) for each site and then Python’s event loop runs them all almost at the same time. Here, Python fires off a lot of connections and handles them simultaneously rather than connecting to one device, wait for a response and move on to the next one sequentially.

This way, it should take just a few seconds for the entire process to finish.

The Problems

While running the async version, we faced more problems than anticipated, which caused a lot of head-scratching for us.

Let me expand on them.

Running on WSL

I ran the program and to my surprise, each and every connection failed:

2024-05-03 23:19:04 ERROR  Failed connecting to DEVICE001: timed out opening connection to device
2024-05-03 23:19:04 ERROR  Failed connecting to DEVICE002: timed out opening connection to device
2024-05-03 23:19:04 ERROR  Failed connecting to DEVICE003: timed out opening connection to device
2024-05-03 23:19:04 ERROR  Failed connecting to DEVICE004: timed out opening connection to device
...

I could see the connections being successfully established in the output of watch -d -n1 "ss -tp | grep ':830'":

State  Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB  31     0      WSL:52812          DEVICE001:830     users:(("python",pid=15188,fd=31))
ESTAB  31     0      WSL:22004          DEVICE002:830     users:(("python",pid=15188,fd=6))
ESTAB  31     0      WSL:26190          DEVICE003:830     users:(("python",pid=15188,fd=44))
ESTAB  21     0      WSL:36464          DEVICE004:830     users:(("python",pid=15188,fd=30))
ESTAB  31     0      WSL:34930          DEVICE005:830     users:(("python",pid=15188,fd=47))
ESTAB  31     0      WSL:41576          DEVICE006:830     users:(("python",pid=15188,fd=20))
ESTAB  31     0      WSL:31500          DEVICE007:830     users:(("python",pid=15188,fd=23))
ESTAB  31     0      WSL:22440          DEVICE008:830     users:(("python",pid=15188,fd=35))
ESTAB  31     0      WSL:46740          DEVICE009:830     users:(("python",pid=15188,fd=46))
...

But for some odd reason the program deemed these as failures. Something is clearly messing with the program here and after some more tests my suspicions pointed towards WSL (Ubuntu 20.04.6 - Kernel 5.15.146.1), where I develop the application.

My machine is a beast; 32 GB memory (although WSL uses half of it by default, which is still more than enough) and a 28 cores CPU. So, this cannot be a hardware limitation.

The local port range was also not a problem, we have 28,231 ports to use.

TheGrayNode.io ~> sysctl net.ipv4.ip_local_port_range

net.ipv4.ip_local_port_range = 32768    60999

The maximum number of file descriptors (fd) was not a problem either:

TheGrayNode.io ~> ulimit -aH | grep -i descriptor
Maximum number of open file descriptors    (-n) 1048576

In the short amount of time we spent troubleshooting WSL with packet captures and poking around OS internals, we could not find anything meaningful as to why this is happening.

We decided to run the program on a development server to see if it exhibits the same behavior.

Running on a Dev Server

We moved the program to a dev server with 8 GB of RAM and 2 CPU cores and retried. This time we started to see “Successfully connected to …” logs.

But now, we have a brand new problem: 11 devices cause the program to hang forever by keeping their TCP connections open after they are done with their data exchange.

TheGrayNode.io ~> ss -tp | grep ':830'

State  Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB  0      0      DEV_SERVER:48608   DEVICE100:830     users:(("python3",pid=1456887,fd=599))
ESTAB  0      0      DEV_SERVER:49072   DEVICE240:830     users:(("python3",pid=1456887,fd=602))
ESTAB  0      0      DEV_SERVER:38814   DEVICE500:830     users:(("python3",pid=1456887,fd=600))
ESTAB  0      0      DEV_SERVER:37412   DEVICE800:830     users:(("python3",pid=1456887,fd=601))
ESTAB  0      0      DEV_SERVER:53220   DEVICE406:830     users:(("python3",pid=1456887,fd=894))
ESTAB  0      0      DEV_SERVER:51434   DEVICE901:830     users:(("python3",pid=1456887,fd=644))
ESTAB  0      0      DEV_SERVER:53020   DEVICE560:830     users:(("python3",pid=1456887,fd=880))
ESTAB  0      0      DEV_SERVER:52806   DEVICE790:830     users:(("python3",pid=1456887,fd=598))
ESTAB  0      0      DEV_SERVER:35574   DEVICE320:830     users:(("python3",pid=1456887,fd=817))
ESTAB  0      0      DEV_SERVER:46566   DEVICE657:830     users:(("python3",pid=1456887,fd=640))
ESTAB  0      0      DEV_SERVER:49402   DEVICE539:830     users:(("python3",pid=1456887,fd=643))

As a cherry on top, the resource utilization jumps from 192MB memory and 0.2% CPU to 260MB memory and overall 54.0% CPU utilization (100% utilization on the core running Python). Additionally, all HTTP response times increase from 80~110 ms to 1.5~1.7 seconds!

Huh! Why? They are totally fine in the sync version. Nothing hangs.

However, we were using ncclient for the sync and scrapli_netconf for the async version. Could that be it?

To rule out the possibility of any potential bugs in the scrapli_netconf library, we re-wrote the sync version to also use the scrapli_netconf library.

I will spare you the details but the only changes are dropping the use of async/await, utilizing NetconfDriver instead of AsyncNetconfDriver and omitting the transport option. Now, the program runs to completion without any hiccups!

Going back to the async version, if we slightly modify our check_netconf_support coroutine to exclude these devices with a simple filtering logic like:

# previous code

ODD_SWITCHES = (
  "DEVICE100",
  "DEVICE240",
  "DEVICE500",
  ...
)

tasks = [
  asyncio.create_task(check_connectivity(site))
  for site in sites
  if site["DeviceName"] not in ODD_SWITCHES
]

# previous code

The program successfully runs to completion in 10 seconds, that is 180 times faster than the sync version, in other words, this is a 99.44% improvement!

Nonetheless, we are far away from celebrating. The mystery remains. What is causing these devices to stall our program forever?

It cannot be a hardware or a NOS bug since hundreds of other devices are of the same type and are running the same NOS version as these devices. For what it’s worth, they are also located in different geographic locations.

Could this be a deadlock or a race condition? Perhaps a bug in scrapli_netconf’s AsyncNetconfDriver?

Going Down The Rabbit Hole

Let’s name those 11 devices the culprits and quickly recap what we have observed so far:

  1. The sync version using both ncclient and scrapli_netconf works fine for all devices
  2. The async version hangs forever, thanks to the culprits
  3. The async version works fine if we exclude the culprits
  4. There are hundreds of devices with the same NOS and hardware as the culprits and they work just fine
  5. In the async version, the culprits initially communicate like the rest but end up keeping the TCP session open forever, unless we kill the program. Then the connections terminate gracefully.
  6. In the async version and with the the culprits, the CPU utilization, goes through the roof. The CPU core running the Python process gets 100% utilized, to the point that it can noticeably slow down all I/O operations on a system with 2 CPU cores; you can feel it when interacting with the terminal.

It is very unlikely, but maybe Python cannot handle this many connections concurrently. What if we flip the scenario and exclude all devices but the culprits?

Running Only Against The Culprits

Running the program only against the culprits still yields the same result. So, clearly, these boxes are at fault, or are they? Why only in the async version then?

If we take a step back and look at the arguments we are passing to the AsyncNetconfDriver, there are three options related to timeout:

conn = AsyncNetconfDriver(
    host=ip_address,
    auth_username=USER,
    auth_password=PASSWORD,
    auth_strict_key=False,
    transport="asyncssh",
    timeout_socket=10,      # <-- 1
    timeout_transport=10,   # <-- 2
    timeout_ops=10,         # <-- 3
)

How come the timeouts are not effective? They should cancel/time out the operation at the 10-second mark.

Maybe there is a bug in the scrapli_netconf library. But we need to understand the inner-workings of it first.

Timeout Parameters

The AsyncNetconfDriver class has these timeout options with default values in seconds:

timeout_socket: float = 15.0,
timeout_transport: float = 30.0,
timeout_ops: float = 30.0,

Let’s first understand what each of them do.

  1. timeout_socket is the timeout for establishing a socket for the SSH session
  2. timeout_transport is the timeout of individual read operations of the TCP connection. Say, time out, if a device has not sent any data for x amount of seconds.
  3. timeout_ops is the timeout for individual SSH channel operations, e.g. send

These are described here and in scrapli docs as well.

Ok, the timeout_socket is definitely not what we are looking for, but which one of the remaining two is actually responsible for timing out our sessions?

Follow The Timeout Rabbit

The timeout arguments are passed to AsyncNetconfDriver’s parent/super classes using the super().__init__(...) call.

One of its parent classes is AsyncDriver that takes keyword arguments (kwargs) passed to it and in turn passes them to its parent class, BaseDriver using super().__init__(**kwargs).

BaseDriver consumes these options in two ways:

  1. _base_channel_args takes the timeout_ops
  2. _base_transport_args takes both the timeout_socket and timeout_transports

Here is the visual form of what was just described:

timeout_hierarchy

Ok, we got to know how and where these timeouts propagate. After all, who doesn’t love some OOP inheritance… But seriously, how are they supposed to take effect?

Time out, Please

I think the fastest and easiest way to understand how timeouts are supposed to take effect is to just map out the logic hierarchy:

timeout_wrapper

I would like to draw your attention to _get_server_capabilities which is decorated by the @timeout_wrapper.

The timeout_wrapper uses the _get_transport_logger_timeout to get:

  1. transport object, in our case asyncssh
  2. logger, which we do not care about
  3. and the timeout value, either from the channel or the transport object

The third point might be a bit confusing. Why do we need both?

Well, remember that we discussed about the initial socket establishment, SSH channel operations, and also the TCP read operations? They all need to have a way to time out if things are not functioning as they should.

Ok, we are finally getting somewhere.

The timeout_transport value should time out the operation since we are not receiving anything from any of the devices, plus, as depicted in the ss output, the send and receive queues are empty.

Furthermore, if we set the log level to debug, scrapli’s read function keeps printing read: b"", meaning that it did not read anything from the wire, further proving our point.

Alright, we are going into the deep waters. As we saw in the logic hierarchy map, scrapli’s timeout_wrapper decorator gets the timeout value and passes it to Python’s asyncio wait_for function, basically offloading the task of timing out to Python’s event loop.

Is this a bug in Python after all?!

Peeking Into The Deep Waters

The way wait_for function is written is not very straight-forward and while searching for similar issues, I came upon an interesting comment from the legend himself. Guido van Rossum, the creator of Python:

gvr_on_wait_for

Ok, this is not looking promising at all.

Although it is mentioned that some issues are fixed in Python version 3.12, we cannot use that version because an incompatible database driver prevents us from upgrading past version 3.11.3.

At this point, I am questioning my life choices…

Perhaps we should have written a Python module or just this part in another language, like Golang, Nim, or Rust, that arguably have better concurrency models and are not constrained by something like GIL.

However, it is unreasonable to expect all network engineers, who are the authors of this application, to know multiple programming languages anyway. On top of that, we really do not know if another language is going to solve this issue for sure.

RESTCONF is another option, but enabling the HTTP service across 14,000 devices with various NOS versions could open another can of worms due to notorious HTTP vulnerabilities with networking gears.

Additionally, a lot of refactoring had to be done throughout the whole codebase to make this transition possible.

We do not have many options left here.

Hmmm, could we possibly borrow the fixed wait_for from Python 3.12.3 and port it to 3.11.3? Just to see how things behave. Promise.

The wait_for Implementation In Python 3.12

The wait_for function in Python 3.12.3 looks much more cleaner and understandable than the previous version:

async def wait_for(fut, timeout):
    if timeout is not None and timeout <= 0:
        fut = ensure_future(fut)

        if fut.done():
            return fut.result()

        await _cancel_and_wait(fut)
        try:
            return fut.result()
        except exceptions.CancelledError as exc:
            raise TimeoutError from exc

    async with timeouts.timeout(timeout):
        return await fut

At a quick glance, it seems like these are the only changes between the versions:

Nonetheless, the quickest way to put this to the test is to use Python 3.12.3 and just run the code related to the site processing logic and nothing else.

Oh, New Gifts?

We ran the code and it seems like this is a gift that keeps on giving.

Now, there are two new issues:

  1. We get a beautiful, yet aggressive, memory leak. This leak causes the OS to go belly-up in less than five minutes.
  2. The receive queue of these connections never gets emptied
TheGrayNode.io ~> ss -tp | grep ':830'

State  Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB  1048   0      DEV_SERVER:48608   DEVICE100:830     users:(("python3",pid=3992684,fd=16))
ESTAB  592    0      DEV_SERVER:49072   DEVICE240:830     users:(("python3",pid=3992684,fd=11))
ESTAB  1048   0      DEV_SERVER:38814   DEVICE500:830     users:(("python3",pid=3992684,fd=7))
ESTAB  0      0      DEV_SERVER:37412   DEVICE800:830     users:(("python3",pid=3992684,fd=15))
ESTAB  48     0      DEV_SERVER:53220   DEVICE406:830     users:(("python3",pid=3992684,fd=14))
ESTAB  592    0      DEV_SERVER:51434   DEVICE901:830     users:(("python3",pid=3992684,fd=8))
ESTAB  848    0      DEV_SERVER:53020   DEVICE560:830     users:(("python3",pid=3992684,fd=13))
ESTAB  592    0      DEV_SERVER:52806   DEVICE790:830     users:(("python3",pid=3992684,fd=9))
ESTAB  1048   0      DEV_SERVER:35574   DEVICE320:830     users:(("python3",pid=3992684,fd=17))
ESTAB  592    0      DEV_SERVER:46566   DEVICE657:830     users:(("python3",pid=3992684,fd=10))
ESTAB  848    0      DEV_SERVER:49402   DEVICE539:830     users:(("python3",pid=3992684,fd=12))

I wonder why the Out-Of-Memory or OOM killer is not kicking in here to prevent the system from completely grinding to a halt.

We are way far down the rabbit hole; let’s not dig deeper by getting into the Linux kernel internals.

The Show Must Go On

It is clear that we cannot proceed with Python’s asyncio for our use case at this time, which is a bit disappointing.

However, there is still hope:

Since we are dealing with I/O, threading is a better and cheaper option than multiprocessing.

Although the GIL prevents threads from running at the same time and threads are a bit more expensive than the async method, we are still fine–more than fine.

The Threading Implementation

The threading method is very easy to implement. Here we utilize the ThreadPoolExecutor that provides a pool of reusable threads for running tasks and simply leverage the existing synchronous implementation.

BATCH_SIZE = N

with ThreadPoolExecutor(max_workers=20) as tpe:
    futures: set[Future] = set()

    for start in range(0, len(sites), BATCH_SIZE):
        end = min(start + BATCH_SIZE, len(sites))

        futures.add(tpe.submit(check_netconf_support, sites[start:end]))

    for future in as_completed(futures):
        netconf_supported_sites.extend(future.result())

In the first loop, we create Tasks and submit them to the pool using the submit() function, passing in the name of the function we want to execute on a separate thread alongside its arguments.

The submit() function returns a Future object that lets us check on the status of the task and get its result once it completes using the as_completed function in the second loop.

There are two things to pay attention to here:

  1. The max_workers option that specifies how many threads are to be spawned. After doing many tests and benchmarks, 20 seems to hit the sweet spot on a 2 CPU core machine.
  2. A batching logic: To speed up the program significantly.

But why do we need a batching logic? We never did this in the previous methods.

The Need For Batching

When we submit a large batch of tasks to the ThreadPoolExecutor, each task gets queued for execution. A higher batch size means a larger number of sites are processed in one go, leading to a larger number of threads being managed simultaneously. This can introduce significant overhead in terms of context switching and management of threads. In other words, there is a lot of resource contention.

Here is a little benchmark to showcase the difference in resource utilization and completion time when using different libraries and batch sizes:

Batch Size Library Time (seconds) Memory Utilization (MB) CPU Utilization (%)
10 scrapli_netconf 65 90 15.0
10 ncclient 126 100 50.0
50 scrapli_netconf 102 80 10.0
50 ncclient 165 102 40.0
100 scrapli_netconf 180 68 2.5
100 ncclient 224 94 30.0
300 scrapli_netconf 401 60 1.5
300 ncclient 467 77 14.0

As depicted in the table above, lower batch sizes utilize the CPU more efficiently. This is because smaller batches allow for better CPU scheduling and less idle time waiting for I/O operations to complete. As batch size increases, CPU utilization drops, indicating that more time is spent waiting rather than processing.

The Results

With a batch size of 10 and using the scrapli_netconf library the entire process takes 65 seconds to complete.

In comparison to the synchronous implementation, the program has gotten approximately 28 times faster, which is a 96.39% improvement.

It is not quite as fast as the async implementation (10 seconds completion time), but it is acceptable. On top of that, we can run the program on WSL again without any issues.

The Summary

In this post, we went through synchronous, asynchronous, and multithreaded methods of interacting with 14,000 network devices, thanks to an in-house-built tool that helps us onboard IT, IoT, and OT devices into our global network infrastructure.

We also did some fun troubleshooting, and although it did not result in a concrete fix, we got to learn how the tools we use work and how to find our way around when unexpected issues arise.

Maybe there will be a follow-up post in the future if we decide to invest more time into this and potentially producing a fix.

This post got quite longer than I thought and if you have made it this far, very well done!

Stay tuned!

Acknowledgements

If you would like to learn more about Threading in Python, I highly recommend this site.