October 1, 2019
Many programming languages allow for multithreading and multiprocessing as a means of parallel execution of code. This form of programming allows for tasks to be split into groups of tasks that can be executed concurrently. This can lead to faster execution times for tasks that are not blocked by other operations. There are however several advantages and disadvantages to this form of programming.
Multithreading and Multiprocessing can allow for better performance when executing certain operations. There are many different forms of multithreading and multiprocessing implementations, it is important to know the limitations of each implementation and to consider such things as:
Multithreading is the ability to run concurrent tasks within the same process. This way of concurrent programming allows the the threads to share state, and execute from the same memory pools. The advantages of this form of concurrency are:
While the advantages of this style of concurrency are clear, the nature of the shared memory and resources can result in complexity in ensuring the data consistency. For example the use of the shared memory and resources, can result in data from one thread 'leaking' into another thread. In most languages that support this style of operation these errors are protected (as best as they can be) by using locks and synchronizers. These locks will try to prevent other threads from access the resources while a lock is held by a thread.
The usage of these locks and synchronizers can cause complication when dealing with threads as you have to now know what threads hold locks, and ensure that they are released when they are no longer needed. Mistakes in theses areas can result in threads waiting for a long time for a lock to be released, effectively removing the advantages of the multithreaded environment. These errors can also lead in the worst case a 'dead lock', this is a situation where all the threads are waiting for each other to release locks. When a process gets into these states is it very difficult (if not impossible) for the system to recover, meaning a restart of the process will be required. It can also be very difficult to know that these cases have occurred, as the process will continue to run, because the process might still respond to some requests. For these situations it is important to have tools in place (such as FusionReactor) to alert you to these situations.
The duration and throughput of concurrent operations should also be considered, as this can quickly lead to issues with executor pools queuing operations. In languages such as Java a common approach to multithreaded execution of code is to use an executor pool, this is a collection of threads that execute from a queue of tasks. This approach means that the overhead of creating new threads is reduced as the threads are reused. The problems can come from the number of executors available for the tasks, if for example there is two tasks submitted to the queue every second, and you have a executor pool of 2 threads (so you can execute 2 tasks at the same time). Then lets say the tasks take an average of 1.5 seconds to complete, this will lead to a problem as the queue will slowly get larger as there is not enough executors to complete the tasks before the next tasks are added.
In some programming languages there are options to use what are know as green threads. These simulate a multithreaded environment while not actually using more native threads. In Python for example, these are called greenlets, they allow the programmer to control when the process can switch to another thread to process another task. Like native threads there are advantages to this form of concurrency:
As with native threads there are advantages to this form of concurrency, it is important that caution is used as green threads cannot yield (allow other threads to execute) when native blocking operations are invoked. This means that if you are reading or writing to a network (waiting on database for example) you can easily get into a situation where the entire process will block until this completes. To prevent such situations the use of more complex IO is needed to stop the process blocking on the operation. This increased complexity can quickly overcome the advantages of using green threads.
Multiprocessing also know as process forking, is a way of running multiple tasks at the same time. This is different to multithreading as we are duplicating the whole process, duplicating the memory and resource requirements. This method can be used to obtain the benefits of a multithreaded environment without having to deal with the concurrency issues that come with those environments. There a a few advantages of this approach:
In a multiprocessing environment each process has it's own memory set which is not shared with the other processes. In modern environments multiprocessing is used in a few different ways, languages such as Node.js support options to use process forking to run multiple instances of the code on the same machine at the same time to simulate thread pools. This process duplicates the master process and launches it as a new process on the machine, running the code as if the code just started. This has the advantages of separating out all the memory usage allowing for easier code development, however, this has a high overhead (when compared to multhreading) as the entire process is duplicated and executed.
Another way of using multiprocessing is to deploy multiple instance of the process behind a loadbalancer and forward the requests to the instances. This is a common approach when dealing with web applications, and when then need to scale up and down quickly is desired. There are many ways to do this including the usage of managed services such as AWS Elastic beanstalks, or kubernetes clusters. With this method of multiprocessing you can combine it with multi threading to get the benefits from both methodologies.
There are advantage to both multithreading and multiprocessing, in both cases they can be used to improve the performance and reliability of applications. There are many things to consider when choosing what is the best for the application, such as:
Ultimately there is not a better way, each approach has advantages and disadvantages and will depend on the workload of the application as to which approach is best. As a quick guide; Multithreading is good when dealing with remote systems, as there is no need to block the entire process while it waits for native operations (such as IO). Multiprocessing is good when dealing with know quick operations that do not need external systems, or tasks that need to execute various operations in a specific order without sharing data with other executors.
Experienced developer in various languages, currently a product owner of nerd.vision leading the back end architecture.