Threading | By Napster
I’ve been browsing the internet and found many threads/questions about threading vs forking, thread vs process java, and thread vs process python. There are also many questions regarding which thread should be used in applications. I wrote this post to clarify the differences between them. It also provides some guidelines on how you can decide which one you should use in your applications/scripts.
What is Fork/Forking?
Fork is a new process. It looks identical to the parent process, but it is still a completely different process. The parent process creates an address space for the child. Each parent and child process have the same code segment but execute separately.
Forking can be described as running a command on shell/linux. The shell forks a child process each time a user issues commands. Once the task is complete, it returns the command to the user.
A fork system call creates a copy of the pages that correspond to the parent process. This copy is then loaded into a separate memory location. However, in some cases this may not be necessary. As execv replaces parent process address space, it is not necessary to copy parent process pages.
Here are some things you should know about forking:
- Each child will have its own process ID.
- Each child process must have its own copy of the parent’s file descriptionor.
- Child process cannot inherit file locks that have been set by the parent process.
- All semaphores which were open during the parent process will be opened in the child process.
- The child process will have its own copy of the message queue descriptors from parents.
- Each child will have its own memory and address space.
Because of these reasons, fork is more universally accepted than thread:
- Fork-based implementations make development much simpler.
- Code that is based on forks is easier to maintain.
- Because each process forked in its own virtual space, it is safer and more secure. A process can crash or have a buffer overrun and it doesn’t affect any other processes.
- It is more difficult to debug threads code than forks.
- Threads are heavier than fork.
- Because there are no context switching or locking over-heads, forking is quicker than threading on a single CPU.
Telnetd(freebsd), proftpd (vsftpd), proftpd and Apache13 are some examples of applications where forking can be used.
Pitfalls in Fork
- Fork should allow each new process to have its own memory/address space. This allows for a faster startup and stopping time.
- You have two separate processes that need to communicate with each other if you fork. This inter-process communication can be very costly.
- Ghost process occurs when the parent leaves before the forked child. A thread makes this process much simpler. It is easy to suspend, end and resume threads from your parent. If your parent suddenly leaves, the thread will be automatically ended.
- The fork system could fail if there is not enough storage space.
What is Threads/Threading?
Threads are Lightweight Processes (LWPs). Traditionally, a “thread” is a CPU state (and another minimal state), with the rest of the process (data, stack, and I/O) residing in it. Because the thread does not create a virtual memory space or environment in the new system, it requires less overhead than “forking” a process. Multiprocessor systems are more efficient because the process flow can run on another processor, gaining speed through parallel and distributed processing. However, uniprocessor systems also have advantages that exploit latency in I/O which could halt process execution.
Threads that are part of the same process share:
- Instructions for the process
- Most data
- open files (descriptors)
- Signals and signal handlers
- current working directory
- Group and user id
Every thread is unique
- Thread ID
- Set of registers, stack pointer
- Stack for local variables, returns addresses
- Signal mask
- Return value: errno
Voici quelques points to be aware of when threading is discussed:
- Threads are most efficient on multi-processor and multi-core systems.
- Thread – Only one thread table/process and one scheduler are required.
- All threads in a process share one address space.
- A thread doesn’t keep a list or know which thread it created.
- By sharing basic parts, threads reduce overhead.
- Because they use the same memory block as their parent, threads are more efficient in memory management than creating new ones.
There are many pitfalls in threads
- Race conditions: Threads are vulnerable to multiple threads working simultaneously on the same data. They don’t have any protection against other threads modifying it. This is race condition. Although the code may appear in the order that you want it to, the threads are run at random by the operating system. Threads may not execute in the same order that they were created. They can also execute at different speeds. Threads may produce unexpected results when they are running (racing to finish). To achieve predictable execution and an expected outcome, joins and mutexes should be used.
- Thread-safe code: Threaded routines should call functions that are “thread safe.” Thread safe code means there must be no global or static variables that other threads could read or clobber, assuming a single threaded operation. If global or static variables are being used, mutexes should be applied to the function or the function must be rewritten to prevent them from being used. Local variables in C are dynamically allocated to the stack. Thread-safe functions do not require static data or shared resources. Thread-unsafe functions can only be used by one thread in a program. The uniqueness of the thread must also be guaranteed. Many functions that are not reentrant return a pointer for static data. This can be avoided by using caller-provided storage or dynamically allocating data. Strtok, which is not re-entrant, is an example of a nonthread-safe function. The re-entrant strtok_r version is the “thread safe” version.
There are many advantages to threads
- Because threads share the same memory space, data sharing between them is much faster. This is why inter-process communication (IPC), is so fast.
- Properly designed and implemented threads will give you greater speed since there is no context switching at the process level in multi-threaded applications.
- Threads can be started and ended quickly.
Threading can be used in some applications, including MySQL and Firebird.
Thread vs process
Each process has the resources required to run a program. Each process is defined by a virtual address space and executable code. It also has open handles to system objects. A security context, unique process identifier and environment variables. A process starts with one thread, commonly called the primary Thread. However, it can create additional threads using any of its threads.
A thread refers to an entity in a process that can have its execution scheduled. Each thread of a process shares its virtual address space as well as system resources. Each thread also has its own exception handlers and system resources. The thread context contains the thread’s machine registers, the kernel, thread environment blocks, and a user stack within the address space of the thread. A thread can have its own security context that can be used to impersonate clients.
Microsoft Windows supports preemptive tasking. This allows for simultaneous execution of multiple threads across multiple processes. A multiprocessor computer can simultaneously execute as many threads and processors as it has processors.
1. Which one should I use for my application?
Answer: It depends on many factors. Forking is heavier than threading and has a higher startup cost and shutdown cost.Interprocess communication (IPC), is slower and harder than interthread communications. Inter communication is a race between threads. Contrarily, a thread can crash and take down all other threads. A buffer overrun opens up security holes in all threads.
These would share the same address space as the parent process. They only required a smaller context switch which would make context switching more efficient.
2. Which is better: threading or forking?
Answer: It all depends on what you’re looking for. Answer: In modern Linux (2.6.x), there is no difference in performance between context switching of a process/forking and a thread. Only the MMU stuff is extra for the thread. The shared address space is a problem. A faulty pointer within a thread could corrupt the memory of a parent process or another thread in the same address area.
3. What kind of items should be threaded and multitasked?
Ans. If you’re a programmer who wants to use multithreading, then the obvious question is: What parts of your program should/should not be threaded? These are some general guidelines (if you agree to them, have fun!) ):
- Is there a group of long operations that doesn’t depend on any other processing (like printing a document, painting a window, responding to a mouse click, calculating a column in a spreadsheet, signal handling etc.)? )?
- Are there any data locks (the volume of shared data can be identified and “small”)?
- Are you ready to deal with locking (mutually exclusion of data regions from other threads), deadlocks, a condition in which two COEs have locked data the other is trying to obtain) and race conditions (a serious and difficult problem where data isn’t locked properly and corrupted by threaded reads and writes)
- Is it possible to break down the task into different “responsibilities” E.g. One thread could handle signals and another handle GUI stuff. ?
- It all depends on your application.
- Although threads are more powerful than events and have a greater impact on the world, power is not always an option.
- Forking is easier than programming threads, so it’s best to leave this task to the experts.
- Threads are best used for applications that require high performance.