Multiprocess Firefox: testing continues. Crossing Windows Boundaries: Processes and Threads

Useful tips

This is the fourth article in the "Breaking the Windows Limits" series, in which I talk about the limitations that exist for fundamental resources in Windows. This time, I'm going to discuss with you the limit on the maximum number of threads and processes supported by Windows. Here I'll briefly describe the difference between a thread and a process, the survey thread limitation, and then we'll talk about process-related limitations. First of all, I decided to talk about thread restrictions, since everyone active process has at least one thread (a process that has exited but is referenced in a handle provided by another process does not have any threads), so the process limits are directly dependent on the underlying thread limits.

Unlike some variants of UNIX, most Windows resources do not have a fixed limit built into the operating system at build time, but rather receive limits based on the underlying resources available to the OS, which I talked about earlier. Processes and threads, for example, require physical memory, virtual memory, and pool memory, so the number of processes and threads that can be created on a given Windows system, is ultimately determined by one of these resources, depending on how these processes or threads were created and which of the underlying resource limits is reached first. Therefore, I recommend that you read my previous articles if you haven't already, because next I will be addressing concepts such as reserved memory, allocated memory, and system limitation memories that I talked about in my previous articles:

Processes and Threads
The Windows process is essentially a container that stores command code from executable file. It represents a kernel process object, and Windows uses this process object and its associated data structures to store and maintain information about executable code applications. For example, a process has a virtual address space in which its private and public data is stored and into which the executable image and its associated DLLs. Windows uses diagnostic tools to record information about a process's resource usage to help track and execute requests, and records the process's references to operating system objects in the process handle table. Processes operate with a security context called a token that identifies a user account, group account and the privileges assigned to the process.

A process includes one or more threads that actually execute code in the process (technically, it's not processes that run, but threads) and is represented in the system as kernel thread objects. There are several reasons why applications create threads in addition to their original starting thread: 1) processes that have user interface, usually create threads in order to do their work while keeping the main thread responsive to user commands related to data entry and window management; 2) Applications that want to use multiple processors to scale performance, or that want to keep running while threads are stalled waiting for I/O to synchronize, create threads to take advantage of multithreading.

Thread restrictions
In addition to basic information about a thread, including the state of the CPU registers, the priority assigned to the thread, and information about the thread's resource usage, each thread has a portion of the process address space allocated to it, called the stack, which the thread can use as working memory as the program code runs. for passing function parameters, storing local variables and addresses of function results. Thus, to avoid wasting system virtual memory, only part of the stack is initially allocated, or part of it is transferred to the thread, and the rest is simply reserved. Since memory stacks grow in descending order, the system places so-called "guard pages" of memory outside the allocated part of the stack, which ensure automatic allocation additional memory(called stack extension) when you need it. The following illustration shows how the allocated stack area gets deeper and how the guard pages move as the stack expands in the 32-bit address space:

The Portable Executable (PE) structures of executable images determine the amount of address space that is reserved and initially allocated to a thread's stack. By default, the linker reserves 1MB and allocates one page (4KB), but developers can change these values either by changing the PE values when they communicate with their program or by calling the CreateTread function on a separate thread. You can use a utility such as Dumpbin that comes with Visual Studio to view the settings executable program. Here are the results of running Dumpbin with the /headers option on the executable generated by the new Visual Studio project:

Converting the numbers from hexadecimal system calculus, you can see that the stack reserve size is 1MB and the allocated memory area is 4KB; using new utility from Sysinternals called MMap, you can attach to this process and view its address space, and thereby see the process's originally allocated stack memory page, the guard page, and the rest of the reserved stack memory:

Because each thread consumes a portion of the process's address space, processes have a basic limit on the number of threads they can create, equal to the size of their address space divided by the size of the thread's stack.

Limitations of 32-bit streams
Even if a process had no code or data at all and the entire address space could be used for stacks, a 32-bit process with a default address space of 2 bytes could create a maximum of 2048 threads. Here are the results of Testlimit running on 32-bit Windows with the -t (thread creation) option, confirming the presence of this limitation:

Once again, since some of the address space was already used for code and initial heap memory, not all of the 2GB was available for thread stacks, so the total number of threads created could not reach the theoretical limit of 2048 threads.

I tried running Testlimit with additional option, which gives the application an extended address space, hoping that if it is given more than 2GB of address space (for example, on 32-bit systems, this is achieved by running the application with the /3GB or /USERVA option for Boot.ini, or the equivalent BCD option on Vista and later increaseuserva), it will use it. 32-bit processes are allocated 4GB of address space when running on 64-bit Windows, so how many threads can a 32-bit Testlimit running on 64-bit Windows create? Based on what we've already discussed, the answer should be 4096 (4GB divided by 1MB), but in practice this number is much lower. Here is 32-bit Testlimit running on 64-bit Windows XP:

The reason for this discrepancy lies in the fact that when you run a 32-bit application on 64-bit Windows, it is actually a 64-bit process that executes 64-bit code on behalf of 32-bit threads, and therefore has per-thread memory areas are reserved for 64-bit and 32-bit thread stacks. For a 64-bit stack, 256KB are reserved (exceptions are OSes released before Vista, in which the initial stack size of 64-bit threads is 1MB). Since every 32-bit thread starts out in 64-bit mode and the stack size it is allocated at startup is larger than the page size, in most cases you will see that a 64-bit thread's stack is allocated at least 16Kb. Here is an example of 64-bit and 32-bit stacks of a 32-bit stream (the 32-bit stack is labeled "Wow64"):

32-bit Testlimit was able to create 3204 threads on 64-bit Windows, which is explained by the fact that each thread uses 1MB + 256KB of address space for the stack (again, the exception is Windows versions to Vista, where 1MB+ 1MB is used). However, I got a different result when running 32-bit Testlimit on 64-bit Windows 7:

The differences between the results on Windows XP and Windows 7 are due to the more promiscuous nature of the address space allocation scheme in Windows Vista,Address Space Layout Randomization (ASLR), which leads to some fragmentation. Randomization DLL loading, thread stack and allocation dynamic memory, helps improve protection against malware. As you can see in the following snapshot of the VMMap program, in test system There is still 357MB of address space available, but the largest free block is 128KB, which is less than the 1MB required for a 32-bit stack:

As I noted, the developer can override the default stack reserve size. One of possible reasons This may be done to avoid wasting address space when it is known in advance that the thread's stack will always use less than the default 1MB. The Testlimit PE image uses a stack reserve size of 64KB by default, and when you specify the -n option along with the -t option, Testlimit creates threads with 64KB stack sizes. Here is the result of running this utility on a system with 32-bit Windows XP and 256MB RAM (I specifically ran this test on a weak system to highlight this limitation):

It should be noted here that another error occurred, which means that in this situation the cause is not the address space. In fact, 64Kb stacks should provide approximately 32,000 threads (2Gb/64Kb = 32,768). So what limitation appeared in in this case? If you look at the possible candidates, including allocated memory and pool, they do not give any clues in finding the answer to this question, since all these values are below their limits:

We can find the answer in additional information about memory in the kernel debugger, which will show us the required limit associated with the available resident memory, the entire amount of which has been exhausted:

Available resident memory is physical memory allocated for data or code that must reside in RAM. The sizes of the nonpaged pool and nonpaged drivers are calculated independently, as is, for example, the memory reserved in RAM for I/O operations. Each thread has both user-mode stacks, as I've already mentioned, but they also have a privileged-mode (kernel-mode) stack, which is used when threads operate in kernel mode, such as executing system calls. When a thread is active, its kernel stack is pinned in memory so that the thread can execute code in the kernel for which required pages cannot be absent.

The base kernel stack takes up 12Kb on 32-bit Windows and 24Kb on 64-bit Windows. 14225 threads require approximately 170MB of resident memory, which is exactly the same as free memory on this system with Testlimit disabled:

Once the limit of available system memory is reached, many basic operations begin to fail. For example, here is the error I got when I double clicked on a shortcut Internet Explorer located on the desktop:

As expected, running on 64-bit Windows with 256MB of RAM, Testlimit was able to create 6,600 threads - about half as many threads it could create on 32-bit Windows with 256MB of RAM - before running out of available memory:

The reason I previously used the term "base" kernel stack is that the thread that does the graphics and windowing functions gets a "big" stack when it makes the first call, which is equal to (or greater than) 20Kb per 32-bit Windows and 48Kb on 64-bit Windows. Testlimit threads do not call any such API, so they have basic kernel stacks.
Limitations of 64-bit streams

Like 32-bit threads, 64-bit threads have a default 1MB stack reserve, but 64-bit threads have a lot more user address space (8TB), so it shouldn't be a problem when it comes to creating a large number of threads. Yet it is clear that resident available memory is still a potential limiter. The 64-bit version of Testlimit (Testlimit64.exe) was able to create, with and without the -n option, approximately 6600 threads on a system with 64-bit Windows XP and 256MB RAM, exactly the same as the 32-bit version created, because the limit was reached resident available memory. However, on a system with 2GB of RAM, Testlimit64 was only able to create 55,000 threads, which is significantly less than the number of threads that this utility could create if the limitation was the resident available memory (2GB/24KB = 89,000):

In this case, the cause is the thread's allocated initial stack, which causes the system to run out of virtual memory and an error appears due to insufficient paging file capacity. Once the amount of allocated memory reaches the size of RAM, the rate of creation of new threads decreases significantly because the system begins to "slip", previously created thread stacks begin to be paged to the page file to make room for new thread stacks, and the page file has to grow. With the -n option enabled, the results are the same, since the initial amount of allocated stack memory remains the same.

Process Limitations
The number of processes supported by Windows should obviously be less than the number of threads, because each process has one thread and the process itself causes additional resource consumption. 32-bit Testlimit running on a system with 64-bit Windows XP and 2GB of system memory creates about 8400 processes:

If you look at the result of the kernel debugger, it becomes clear that in this case the limit of resident available memory is reached:

If a process were to use resident available memory to accommodate just the privileged mode thread's stack, Testlimit would be able to create many more than 8400 threads on a 2GB system. The amount of resident available memory on this system without Testlimit running is 1.9GB:

By dividing the amount of resident memory used by Testlimit (1.9 GB) by the number of processes it creates, we find that each process has 230 KB of resident memory. Since the 64-bit kernel stack takes up 24 KB, we end up with approximately 206 KB missing per process. Where is the rest of the used resident memory? When a process is created, Windows reserves enough physical memory to provide a minimum working set of pages. This is done to ensure that the process, in any situation, will have enough physical memory at its disposal to store the amount of data necessary to provide a minimum working set of pages. By default, the size of the working set of pages is often 200KB, which can be easily verified by adding the Minimum Working Set column in the Process Explorer window:

The remaining 6Kb is resident accessible memory, allocated for additional nonpageable memory (from the English nonpageable memory), in which the process itself is stored. A process on 32-bit Windows uses slightly less resident memory because its thread privileged stack is smaller.

As with user-mode thread stacks, processes can override their default working set page size by using the SetProcessWorkingSetSize function. Testlimit supports the -n option, which, in combination with the -p option, allows you to set the minimum possible size of the working set of pages to 80Kb for child processes of the main Testlimit process. Because child processes need time to reduce their working page sets, Testlimit, after it can no longer spawn processes, pauses and tries to continue running, giving its child processes a chance to execute. Testlimit, launched with the -n parameter on a system with Windows 7 and 4GB of RAM, has a different limit from the resident available memory limit - the limit on the allocated system memory:

In the screenshot below, you can see that the kernel debugger reports not only that the allocated system memory limit has been reached, but also that, after reaching this limit, thousands of memory allocation errors, both virtual and memory, have occurred. allocated to the paged pool (the allocated system memory limit was actually reached several times, because when an error occurred due to the lack of page file capacity, this same volume increased, pushing back this limit):

Before running Testlimit, the average memory allocation was approximately 1.5GB, so the threads took up about 8GB of allocated memory. Therefore, each process consumed approximately 8 GB/6600 or 1.2 MB. The result of executing the!vm command of the kernel debugger, which shows the distribution of private memory for each process, confirms the correctness of this calculation:

The initial amount of memory allocated to a thread's stack, described earlier, has little impact on the remaining memory allocation requests required for the process's address space data structures, page table entries, handle table entries, process and thread objects, and its own data that the process creates during its initialization.

How many processes and threads will be enough?
Thus, the answers to the questions "how many threads does Windows support?" and "how many processes can you run simultaneously on Windows?" interconnected. Apart from the nuances of the methods by which threads determine their stack size and processes determine their minimum working set of pages, the two main factors that determine the answers to these questions for each specific system, are the amount of physical memory and the limit of allocated system memory. In any case, if an application creates enough threads or processes to approach these limits, then its developer should reconsider the design of that application, since there are always various ways achieve the same result with a reasonable number of processes. For example, a major goal when scaling an application is to keep the number of running threads equal to the number of CPUs, and one way to achieve this is to move from using synchronous I/O to asynchronous using completion ports, which should help keep the number of running threads consistent with the number of CPUs. CPU.

17 Dec 2010

You can download it for free from us.
This material is provided by the site for informational purposes only. The administration is not responsible for its contents.

In this article I will describe the fundamental differences between Apache and Nginx, the frontend-backend architecture, installing Apache as a backend and Nginx as a frontend. I will also describe a technology that allows you to speed up the operation of a web server: gzip_static+yuicompressor.

Nginx– the server is lightweight; it starts a specified number of processes (usually number of processes = number of cores), and each process in the loop accepts new connections, processes current ones. This model makes it possible to service a large number of clients. However, with this model, you cannot perform lengthy operations when processing a request (for example mod_php), because this will essentially hang the server. With each cycle, two operations are essentially performed within the process: read a block of data from somewhere, write it somewhere. From somewhere to somewhere - this is a connection to a client, a connection to another web server or FastCGI process, a file system, a buffer in memory. The work model is configured by two main parameters:
worker_processes – number of processes to run. Usually set equal to the number of processor cores.
worker_connections – the maximum number of connections processed by one process. Directly depends on the maximum number of open file descriptors on the system (1024 by default on Linux).

Apache– the server is heavy (it should be noted that if desired, it can be quite lightened, but this will not change its architecture); it has two main operating models – prefork and worker.
When using the prefork model, Apache creates new process to process each request, and this process does all the work: accepts the request, generates content, serves it to the user. This model is configured with the following parameters:

MinSpareServers – the minimum number of processes hanging idle. This is necessary so that when a request arrives, we can start processing it faster. The web server will launch additional processes to maintain the specified number.
MaxSpareServers – the maximum number of processes hanging idle. This is necessary so as not to borrow extra memory. The web server will kill unnecessary processes.
MaxClients – maximum number of parallel clients served. The web server will not launch more than the specified number of processes.
MaxRequestsPerChild – the maximum number of requests that the process will process before the web server kills it. Again, to save memory, because... memory in the processes will gradually “leak”.

This model was the only one supported by Apache 1.3. It is stable, does not require multi-threading from the system, but consumes a lot of resources and is slightly inferior in speed to the worker model.
When using the worker model, Apache creates multiple processes, each with multiple threads. In this case, each request is completely processed in a separate thread. Slightly less stable than prefork, because a thread crash can crash the entire process, but runs slightly faster, consuming fewer resources. This model is configured with the following parameters:

StartServers – sets the number of processes to start when the web server starts.
MinSpareThreads – the minimum number of threads hanging idle in each process.
MaxSpareThreads – The maximum number of threads hanging idle in each process.
ThreadsPerChild – sets the number of threads launched by each process when the process starts.
MaxClients – maximum number of parallel clients served. In this case, it specifies the total number of threads in all processes.
MaxRequestsPerChild – the maximum number of requests that the process will process before the web server kills it.

Frontend-backend

The main problem of Apache is that each request has a separate process (at least a thread), which is also hung with various modules and consumes a lot of resources. In addition, this process will hang in memory until it delivers all the content to the client. If the client has a narrow channel and the content is quite voluminous, then this may take long time. For example, the server will generate content in 0.1 seconds, and will deliver it to the client in 10 seconds, all this time taking up system resources.
To solve this problem, a frontend-backend architecture is used. Its essence is that the client’s request comes to light server, with an Nginx type architecture (frontend), which redirects (proxy) the request to a heavy server (backend). The backend generates content, very quickly delivers it to the frontend and frees up system resources. The frontend puts the result of the backend's work in its buffer and can give it (the result) to the client for a long time and persistently, while consuming much less resources than the backend. Additionally, the frontend can independently process requests for static files (css, js, images, etc.), manage access, check authorization, etc.

Setting up Nginx (frontend) + Apache (backend)

It is assumed that you already have Nginx and Apache installed. It is necessary to configure the servers so that they listen on different ports. Moreover, if both servers are installed on the same machine, it is better to install the backend only on the loopback interface (127.0.0.1). In Nginx this is configured with the listen directive:

In Apache this is configured with the Listen directive:

Listen 127.0.0.1:81

Next, you need to tell Nginx to proxy requests to the backend. This is done with the proxy_pass 127.0.0.1:81; directive. This is all minimum configuration. However, we said above that it is better to entrust the delivery of static files to Nginx. Let's say we have a typical PHP website. Then we need to proxy only requests to .php files to Apache, processing everything else on Nginx (if your site uses mod_rewrite, then rewrites can also be done on Nginx, and simply throw out .htaccess files). It is also necessary to take into account that the client’s request comes to Nginx, and the request to Apache is made by Nginx, so there will be no Host https header, and Apache will determine the client address (REMOTE_ADDR) as 127.0.0.1. It’s easy to substitute the Host header, but Apache determines REMOTE_ADDR itself. This problem is solved using mod_rpaf for Apache. It works as follows: Nginx knows the client’s IP and adds a certain https header (for example X-Real-IP), which contains this IP. mod_rpaf receives this header and writes its contents to the REMOTE_ADDR Apache variable. This way, PHP scripts executed by Apache will see the real IP of the client.
Now the configuration will become more complicated. First make sure that both Nginx and Apache have the same virtual host, with the same root. Example for Nginx:

server (
listen 80;
server_name site;
root /var/www/site/;
}

Example for Apache:

ServerName site

Now we set the settings for the above scheme:
Nginx:
server (
listen 80;
server_name site;
location/(
root /var/www/site/;
index index.php;
}
location ~ \.php($|\/) (
proxy_pass https://127.0.0.1:81;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header Host $host;
}
}

Apache:

# mod_rpaf settings
RPAFenable On
RPAFproxy_ips 127.0.0.1
RPAFheader X-Real-IP

DocumentRoot "/var/www/site/"
ServerName site

The regular expression \.php($|\/) describes two situations: a request to *.php and a request to *.php/foo/bar. The second option is necessary for the operation of many CMS..com/index.php (since we have defined an index file) and will also be proxyed to Apache.

Let's speed up: gzip_static+yuicompressor

Gzip on the Web is good. Text files They compress perfectly, traffic is saved, and content is delivered to the user faster. Nginx can compress on the fly, so there are no problems here. However, file compression costs certain time, including processor. And here the Nginx gzip_static directive comes to the rescue. The essence of its work is as follows: if upon request Nginx file finds a file with the same name and additional extension".gz", for example, style.css and style.css.gz, then instead of compressing style.css, Nginx will read the already compressed style.css.gz from disk and serve it as a compressed style.css.
Nginx settings will look like this:

https (
...
gzip_static on;
gzip on;
gzip_comp_level 9;
gzip_types application/x-javascript text/css;
...

Great, we will generate the .gz file once so that Nginx will serve it many times. Additionally, we will compress css and js using YUI Compressor. This utility minimizes css and js files as much as possible, removing spaces, shortening names, etc.
And you can make all this compress automatically, and even update it automatically using cron and a small script. Write the following command to cron to run once a day:

/usr/bin/find /var/www -mmin 1500 -iname "*.js" -or -iname "*.css" | xargs -i -n 1 -P 2 packfile.sh

in the -P 2 parameter, specify the number of cores of your processor, do not forget to include the full path to packfile.sh and change /var/www to your web directory.
Write it to the packfile.sh file.

Version 48 was released a few hours ago Mozilla browser Firefox. Compared to the previous 47 version, it contains not only what can be described as “minor changes that lie under the hood.” For the first time, officially in the release version of Firefox, multiprocessing begins to appear, which is enabled by default for some users.

Electrolysis

For modern browser multiprocessing is a sign good manners. All browsers have tabs, and each tab in multi-threaded mode is a separate, isolated process. The advantages of this approach are not only additional measures security and excluding a number of attacks. You can move tabs, add-ons and extensions into separate processes, for example, Adobe Flash Player. If one of the processes fails, you can continue working without restarting the entire browser. Various memory leaks are eliminated, performance increases due to parallelization of tasks to several processor cores. If one of the tabs consumes a lot of resources, the interface will not freeze, but will continue to respond.

Almost everyone has multiprocessing popular browsers: Google Chrome, Internet Explorer, Microsoft Edge, Apple Safari. Mozilla Firefox catches up with them. Electrolysis or e10s is a multi-threading technology in Firefox. In the first versions there are two processes: one for browser tabs, the second for the interface. In future versions there will be several processes for content.

In version 48, Electrolysis is included in the release version for the first time. But not for everyone, but only for one percent of users. If Mozilla is satisfied with the test results, then in ten days the share will be raised to approximately half of the users.

Multiprocessing can be enabled independently. To make sure which version you got, just type about:support in the address bar and look for the line Multiprocess Windows.

Not all extensions are compatible with Electrolysis. You can see lists of the most popular add-ons on the Are We e10s Yet website, where their compatibility status in Firefox multi-threaded mode is indicated. You can turn Electrolysis on immediately and turn it off later if important additions break.

The required parameter in about:config (enter in address bar and press "Enter") - browser.tabs.remote.autostart. The parameter value must be set to true by double clicking.

After restarting the browser, Electrolysis can start working. Sometimes some add-on will interfere with its inclusion.

To work around this limitation, create a new boolean constant in about:config browser.tabs.remote.force-enable and set it to true .

Now Electrolysis will work in forced multiprocess mode. (Please note that this may affect the performance of some add-ons.) In about:support, 1/1 (Enabled by user) will appear next to Multiprocess Windows.

Mandatory signing of extensions

Electrolysis can be enabled or disabled via about:config . But there is no longer a way to disable the mandatory signature of an installed extension. The option disappeared, as intended.

As a result, mandatory non-disabled signatures appeared only in the current version 48. Electronic signature extensions are received from addons.mozilla.org (AMO) regardless of whether the extension is published on AMO or not. The purpose of the requirement to obtain a mandatory signature on the AMO is the security of users. The system filters malicious extensions in blacklist mode.

Other changes

Disappeared in Firefox 49 Android support 2.3 (Gingerbread), OS X versions 10.6 (Snow Leopard), 10.7 (Lion) and 10.8 (Mountain Lion). These OS came out 4-6 years ago.

What's next?

Firefox Hello is a collaboration and communication tool for audio and video chat, built in WebRTC technologies. The service has been built into Firefox since version 34. Hello may disappear in version 49. The bug tracker is discussing the removal of Hello from the browser already in next version. The reason for deletion was a change in development priorities. In the nightly build of Nightly 51 and early version Aurora 50 Hello service has already disappeared. At the time of writing, Beta 49 is not yet available.

Previously, themes and tab groups were removed from Firefox due to lack of use. Support for less popular features slowed down the release of new versions. Perhaps they want to remove Hello for the same reason.

Firefox will continue to remove support for older systems. In version 49, a set of instructions central processor SSE2 will be mandatory for Firefox works under Windows. We are talking about abandoning processors before the Pentium 4 and Ahtlon 64 era.

At all, if you can don't lift Apache, do not do that. Consider whether lighttpd or thttpd can perform the tasks you need. These web servers can come in very handy in situations where system resources There’s not enough for everyone, but it has to work. I repeat once again: we are talking about those situations when the functionality of these products will be sufficient to complete the assigned tasks (by the way, lighttpd knows how to work with PHP). In those situations where without Apache well, there’s simply no way around it, you can usually free up a lot of system resources anyway by redirecting requests to static content (JavaScript, graphics) from Apache to a lightweight HTTP server. The biggest problem Apache is his great appetite for RAM. In this article I will look at methods that help speed up work and reduce the amount of memory it takes up:

processing smaller number parallel queries;

circulation processes;

using not too long KeepAlives;

reducing timeout;

reducing logging intensity;

disabling host name resolution;

disabling use .htaccess.

Loading fewer modules

The first step is to get rid of unnecessary modules from loading. Review the config files and determine which modules you are loading. Do you need all the downloadable modules? Find something that is not used and turn it off, this will save you some memory.

Process fewer concurrent requests

The more processes Apache allowed to run simultaneously, the more simultaneous requests he can process it. By increasing this number, you thereby increase the amount of memory allocated for Apache. Using top, you can see that each process Apache It takes up very little memory because shared libraries are used. IN Debian 5 With Apache 2 The default configuration is this:

StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 20 MaxRequestsPerChild 0

Directive StartServers determines the number of server processes that are launched initially, immediately after its start. Directives MinSpareServers And MaxSpareServers determine the minimum and maximum number of child “spare” processes Apache. Such processes wait for incoming requests and are not unloaded, which makes it possible to speed up the server's response to new requests. Directive MaxClients defines the maximum number of parallel requests simultaneously processed by the server. When the number of concurrent connections exceeds this number, new connections will be queued for processing. In fact, the directive MaxClients and determines the maximum allowed number of child processes Apache,launched simultaneously. Directive MaxRequestsPerChild defines the number of requests that the child process must process Apache before ending its existence. If the value of this directive is set to zero, then the process will not "expire".

For your home server, with the corresponding needs, I corrected the configuration to the following:

StartServers 1 MinSpareServers 1 MaxSpareServers 1 MaxClients 5 MaxRequestsPerChild 300

Of course, the above configuration is completely unsuitable for use on highly loaded servers, but for home, in my opinion, it’s just right.

Process circulation

As you can see, I changed the value of the directive MaxRequestsPerChild. By limiting the lifetime of child processes in this way by the number of requests processed, you can avoid accidental memory leaks caused by poorly written scripts.

Using KeepAlives that are not too long

KeepAlive is a method of support permanent connection between client and server. Initially HTTP protocol was designed not to be oriented towards persistent connections. That is, when a web page is sent to a client, all its parts (images, frames, JavaScript) are transmitted using various, separately established connections. With the advent KeepAlive, browsers now have the ability to request a persistent connection and, once established, download data using a single established connection. This method gives a significant increase in productivity. However Apache By default, it uses a too long timeout before closing the connection, equal to 15 seconds. This means that after all the content has been served to the client who requested KeepAlive, the child process will wait for another 15 seconds for incoming requests. It's a bit much, though. It is better to reduce this timeout to 2-3 seconds.

KeepAliveTimeout 2

Decrease timeout

Alternatively, you can reduce the value of the directive TimeOut, which specifies the time to wait for individual requests to complete. By default its value is 300 , perhaps in your case it will make sense to reduce/increase this value. I personally left it as is for now.

Reducing logging intensity

On the way to increasing server performance, you can try reducing the intensity of logging. Modules such as mod_rewrite, can write debugging information to the log, and if you don’t need it, turn off its output.

Disabling Hostname Resolution

In my opinion, there is no need to reverse resolve IP addresses to hostnames. If you really need them so much when analyzing logs, then you can determine them at the analysis stage, and not while the server is running. The directive is responsible for resolving hostnames HostnameLookups, which is actually installed by default in Off, however check this if you really think you need to disable the conversion.

HostnameLookups Off

Disabling use of .htaccess

File processing .htaccess performed Apache every time you request data. Not only that Apache must download this file, it still takes a lot of time and resources to process it. Take a look at your web server and reconsider the need for files .htaccess. If you need various settings for different directories, maybe it would be realistic to put them in the main server configuration file? And disable processing .htaccess possible by directive in the server configuration.

Php-cgi processes eat up memory, multiplying exponentially and do not want to die after the FcgidMaxRequestsPerProcess limit has expired, after which php-cgi actively begins to dump everything into swap and the system begins to issue “502 Bad Gateway”.

To limit the number of forked php-cgi processes, it is not enough to set FcgidMaxRequestsPerProcess , after processing which processes should die, but they do not always do this voluntarily.

The situation is painfully familiar when php-cgi processes ( children) they breed, eating away their memory, but you can’t force them to die - the s.ki want to live! :) Reminds me of the problem with the overpopulation of the earth by “people” - isn’t it?;)

The eternal imbalance between ancestors and children can be resolved by limiting the number of php-cgi children and their lifetime ( genocide) and control of their reproduction activity ( contraception).

Limiting the number of php-cgi processes for mod_fcgid

The directives below probably play the most main role V limiting the number of php-cgi processes and in most cases the default values given here are harmful for servers with RAM below 5-10 GB:

FcgidMaxProcesses 1000- the maximum number of processes that can be active simultaneously;
FcgidMaxProcessesPerClass 100- maximum number of processes in one class ( segment), i.e. the maximum number of processes that are allowed to be spawned through the same wrapper ( wrapper - wrapper);
FcgidMinProcessesPerClass 3 - minimal amount processes in one class ( segment), i.e. minimum number of processes launched through the same wrapper ( wrapper - wrapper), which will be available after all requests are completed;
FcgidMaxRequestsPerProcess 0- FastCGI should "play in the box" after executing this number of requests.

Which php-cgi number processes will be the most optimal? To determine the optimal number of php-cgi processes you need (pub) Register on our website! :)(/pub)(reg)take into account the total amount of RAM and the size of memory allocated for PHP in memory_limit ( php.ini), which can be consumed by each of the php-cgi processes when executing a PHP script. So, for example, if we have 512 MB, 150-200 of which are allocated for the OS itself, another 50-100 for the database server, mail MTA, etc., and memory_limit=64, then in this case our 200-250 MB RAM, we can run 3-4 php-cgi processes simultaneously without much damage.(/reg)

PHP-cgi process life timeout settings

With the active reproduction of php-cgi children, eating RAM, they can live almost forever, and this is fraught with cataclysms. Below is a list of GMO directives that will help reduce lifetime for php-cgi processes and timely release the resources they occupy:

FcgidIOTimeout 40- time ( per second) during which the mod_fcgid module will try to execute the script.
FcgidProcessLifeTime 3600- if the process exists longer than this time ( in seconds), then it will have to be marked for destruction during the next process scan, the interval of which is specified in the FcgidIdleScanInterval directive;
FcgidIdleTimeout 300- if the number of processes exceeds FcgidMinProcessesPerClass , then the process that does not process requests during this time (in seconds) will be marked for killing during the next process scan, the interval of which is set in the FcgidIdleScanInterval directive;
FcgidIdleScanInterval 120- the interval through which the mod_fcgid module will search for processes that have exceeded the FcgidIdleTimeout or FcgidProcessLifeTime limits.
FcgidBusyTimeout 300- if the process is busy processing requests beyond this time ( per second), then during the next scan, the interval of which is specified in FcgidBusyScanInterval , such a process will be marked for killing;
FcgidBusyScanInterval 120- interval at which scanning and searching are performed busy processes exceeded the limit FcgidBusyTimeout ;
FcgidErrorScanInterval 3- interval ( per second), through which the mod_fcgid module will kill processes awaiting completion, incl. and those that have exceeded FcgidIdleTimeout or FcgidProcessLifeTime . Killing occurs by sending the SIGTERM signal to the process, and if the process remains active, it is killed by the SIGKILL signal.

It must be taken into account that a process that has exceeded FcgidIdleTimeout or FcgidBusyTimeout can live + another FcgidIdleScanInterval or FcgidBusyScanInterval time, after which it will be marked for destruction.

It is better to set ScanIntervals with a difference of several seconds, for example, if FcgidIdleScanInterval 120, then FcgidBusyScanInterval 117 - i.e. so that processes are not scanned at the same time.

PHP-cgi process spawning activity

If none of the above helped, which is surprising, then you can also try to play tricks with the activity of spawning php-cgi processes...

In addition to the limits on the number of requests, php-cgi processes and their lifetime, there is also such a thing as the activity of spawning child processes, which can be regulated by such directives as FcgidSpawnScore, FcgidTerminationScore, FcgidTimeScore and FcgidSpawnScoreUpLimit, the translation of which from bourgeois I think I gave the correct translation ( default values are shown):FcgidSpawnScoreUpLimit , no child processes of the application will be spawned, and all spawn requests will have to wait until an existing process becomes free or until the score ( Score) does not fall below this limit.

If my translation of the description and understanding of the above parameters is correct, then to reduce the activity of php-cgi process spawning, you should lower the value of the FcgidSpawnScoreUpLimit directive or increase the values of FcgidSpawnScore and FcgidTerminationScore.

Results

I hope I have listed and discussed in detail most of the mod_fcgid module directives that will help limit the number of php-cgi and their lifetime, and also reduce resource consumption. Below is the complete mod_fcgid configuration for a successfully working server with a 2500 MHz processor and 512 MB RAM:

Oleg Golovsky