A thread-monitor, often also referred to as a watchdog, is extremely helpful when building multi-threaded and reliable applications. In its simplest form, a watchdog should detect when one or more threads hang or crash, and it should restart the problematic threads if necessary. Depending on your use-case, you could implement this helper in a variety of ways, and you could add many more features such as a heartbeat function that allows each thread to report its progress to the monitor.
Writing a custom worker thread using Python
As with pretty much everything in Python, there’s no on-size-fits-all solution to creating threads. I decided to write a new class that is a subclass of threading.Thread:
import time from threading import Thread class CustomWorker(Thread): def __init__(self, startValue): super().__init__() self.stopped = False self.counter = startValue def run(self): while self.counter > 0 and not self.stopped: print(self.name + ": Remaining tasks: " + str(self.counter)) self.counter -= 1 time.sleep(0.5)
As you can see, the custom Thread has a very simple task. The caller supplies it with a starting value and the worker counts down that value roughly twice a second. When the counter reaches zero, the thread stops. The watchdog should detect when this happens, and it should then restart the worker.
Note that this is a very simple example. In reality, the thread would most likely not end like this. Instead, it would most likely crash due to a fault-condition such as an unhandled exception. Either way, note that I also added a variable that allows the watchdog to gracefully shut the thread down. The thread-monitor can instruct a worker to quit by setting the worker’s stopped variable to false.
Creating a watchdog that monitors the custom threads
Now to the thread-monitor itself. As noted above, this is a simple implementation that spawns a single thread and periodically checks whether that thread is still running. If the thread stopped (for whatever reason), the watchdog spawns a new thread:
# Simple Python watchdog that detects if a thread stopped (e.g., due to an error) # and restarts the thread if necessary. import time from CustomWorker import CustomWorker # Main entry point # Start the watchdog here if __name__ == '__main__': # The thread this watchdog controls t = False try: # Run the watchdog endlessly while True: # If t is False then the thread was either stopped or never started at all # Therefore, create a new Instance and assign it to t, then start the thread if not t: t = CustomWorker(5) t.start() print("Started the thread!") # Check whether t stopped if t and not t.is_alive(): print("Thread is not running!") print("Restarting the thread...") t = False # If t is running, just wait and let the other threads work else: time.sleep(1.0) # Users can exit the watchdog by sending a keyboard interrupt (Ctrl + C) except KeyboardInterrupt: print("Stopping all worker threads...") wait_cycles = 0 # If t is currently running, send a stop signal and wait for the # thread to finish. if t and t.is_alive(): t.stopped = True # Make sure that your custom threads don't block, otherwise the # watchdog will never exit while t.is_alive(): print("Waiting for a worker to stop...") time.sleep(0.5) print("Stopped all workers! Stopping the watchdog...")
I’d like to specifically draw your attention to line 30. Here, the watchdog exits when users send a keyboard interrupt. Before it does that, the watchdog notifies the worker thread that it should stop. Then, the thread-monitor waits for the threads to finish. Here, it’s important that the threads don’t block (e.g., due to file access, dead-locks, etc.). A more sophisticated watchdog could count the number of times it has waited for a thread and force-quit itself (and all child threads) after a certain threshold.
Download the source code
You can download the source code from this GitHub repository. Feel free to share the code as you like, but please share this article along with the code if it helped you!
Tips and tricks
Make sure the watchdog is as simple as possible. There should be little to no chance that the watchdog thread itself halts, crashes, or blocks under normal circumstances.
Make sure that the worker threads are non-blocking or at least provide a way to stop them gracefully (e.g., using events).
This is a simple implementation, you can make it as complex as you wish. I recommend to keep it simple, though.