How to limit the number of php-cgi processes for mod_fcgid? Template monitoring process.

Youtube

Applicable to: System Center 2012 R2 Operations Manager, System Center 2012 - Operations Manager, System Center 2012 SP1 - Operations Manager

Monitoring The template allows you to track whether a certain process is running on your computer. With this template, two different basic scenarios can be implemented: A process may need to be executed to specific application and warnings if it doesn't work or may require an alert if it turns out that an unwanted process is running. In addition to monitoring whether an application is running, you can collect data about the CPU performance and memory used by the process.

Scenarios

Use monitoring template in various scenarios where you need to monitor the running process of an agent-controlled computer under Windows control. Applications can monitor the following processes.

Critical process

A process that must be started at any time. Use monitoring template to ensure that this process runs on computers where it is installed and used monitoring template to measure its performance.

Unwanted process

A process that should not be running. This process may be a known third-party process that can cause corruption, or it may be a process that starts automatically when an error occurs in the application. Monitoring template can be monitored for this process and send an alert if it is found running.

A long-running process.

A process that runs for a short time. If the process is running for too long, it may indicate a problem. Monitoring The template can be monitored during the entire time the process is running and send an alert if the execution time exceeds a certain duration.

Monitoring performed by the process monitoring template

Depending on your selection in the Monitoring Wizard, the monitoring performed by the monitors and rules you create may include any of the following options.

	Description	When turned on
Monitors	Number of desired processes	Enabled when selected processes needed on process for tracking pages and number of processes on running processes pages.
	Execution time of the required process	Enabled when selected processes needed on process for tracking pages and duration on running processes pages.
	Execution of unwanted process	If enabled surveillance scenario for unnecessary processes.
		Enabled when selected processes needed on process for tracking pages and enable CPU warning on performance data pages.
	Process Memory Usage	Enabled when selected processes needed on process for tracking pages and enable Memory Warning on performance data pages.
Data Collection Rules	Process processor collection	Enabled when selected processes needed on process for tracking pages and enable CPU warning on performance data pages.
Data Collection Rules	A collection of process memory usage.	Enabled when selected processes needed on process for tracking pages and enable Memory Warning on performance data pages.

View monitoring data

All data collected monitoring template available in process state representation is in Process tracking and Windows services folders. This object view is displayed for each agent in the selected group. Even if the agent is not monitoring a process, it is listed and the monitor reflects the status for a process that is not running.

You can view the status of individual process monitors by opening the Operations Manager Health Analyzer for a process object. You can view performance data by opening the performance view for a process object.

The same process objects that are listed in process state representation are included in the health analyzer of the computer on which the process is located. The health status of process monitors summarizes the health of the computer.

Wizard Options

On startup monitoring template, you must set the parameter values in the following tables. Each table represents separate page masters

General properties

Common parameters wizard page.

Process for tracking

The following options are available on process for tracking wizard page.

Parameter	Description
Observation Scenarios	Monitoring type - for execution. Select the way and process is performed monitor to monitor for the required process and set to a critical monitor state when the process is not running. Select track only whether the process is running monitor the unwanted process and configure the monitor settings to be in a critical state when the process starts.
Process name	Full process name. This is the process name as it appears in Task Manager. Should not include the path to the executable file itself. You can enter a name or click the ellipsis button ( ... ) button to find the file name.
Target group	The process is monitored on all computers that are included in the specified group.

Running processes

The following options are available on running processes wizard page.

Parameter	Description
Create an alert number of processes - below minimum value or exceeds maximum value longer than the specified period	When checked, the monitor is set to a critical state and an alert is generated if the number of instances of the specified process is less than the minimum or more than the specified maximum for more than the specified period. To make sure that at least one instance of the process is running, minimum and maximum equal to 1.
Minimum number of processes	The minimum number of processes that must be started.
Maximum number of processes	The maximum number of processes that should be running.
Duration	Indicates how long the number running processes must exceed the specified range before setting the monitor to a critical state. Do not set this value to less than 1 minute.
Generate a warning if a process is running longer than the specified period	If checked, the monitor is set to a critical state and a warning is generated if one instance of the process is running for longer than the specified period.

Performance Data

The following options are available on performance data wizard page.

Parameter	Description
Generate an alert if CPU usage exceeds a given threshold	Indicates whether the CPU for the process should be monitored. The monitor will create an error condition on the object and generate a warning if the specified threshold is exceeded. A rule is created to collect CPU for analysis and reporting.
CPU (percentage)	If CPU usage is being monitored, this parameter sets the threshold. If the percentage of total CPU usage exceeds the threshold, the object set is in an error state and a warning is generated.
Generate an alert if memory usage exceeds the specified threshold	Specifies whether memory used by the process should be monitored. The monitor will create an error condition on the object and generate a warning if the specified threshold is exceeded. A rule is created to collect CPU for analysis and reporting.
Memory (MB)	If memory usage is being monitored, this parameter sets the threshold value. If on disk in megabytes (MB) General loading The CPU exceeds the threshold, the object set is in an error state, and a warning is generated.
Number of samples	If you are monitoring CPU or memory usage, this parameter specifies the number of consecutive performance samples that must be exceeded before a set of objects is in an error state and a warning is generated. A numeric value greater than 1 for this parameter limits the noise from monitoring by not generating an alert when the service only briefly exceeds the threshold. The larger the value specified, a long period time before you are notified of the problem. The standard value is 2 or 3.
Sampling interval	If you are monitoring CPU or memory usage, specify the time between performance samples. A lower value for this parameter reduces the time to detect a problem, but increases the load on the agent and the amount of data collected for reporting. The usual value is from 5 to 15 minutes.

Additional Monitoring Features

In addition to performing the specified monitoring monitoring the template creates a targetd class that can be used for additional monitors and work processes. A monitor or rule using this class as a target will be executed on any agent-managed computer in the group specified in the template. If he creates Windows events that indicate an error, for example, you can create a monitor or rule that identifies a specific event and uses the process class as the target.

This is the fourth article in the series “Overcoming Windows borders", in which I talk about the limitations that exist for fundamental resources in Windows. This time, I am going to discuss with you the limitation on maximum amount threads and processes supported by Windows. Here I'll briefly describe the difference between a thread and a process, the survey thread limitation, and then we'll talk about process-related limitations. First of all, I decided to talk about thread restrictions, since everyone active process has at least one thread (a process that has exited but is referenced in a handle provided by another process does not have any threads), so the process limits are directly dependent on the underlying thread limits.

Unlike some variants of UNIX, most Windows resources do not have a fixed limit built into the operating system at build time, but rather receive limits based on the underlying resources available to the OS, which I talked about earlier. Processes and threads, for example, require physical memory, virtual memory, and pool memory, so the number of processes and threads that can be created on a given Windows system, is ultimately determined by one of these resources, depending on how these processes or threads were created and which of the underlying resource limits is reached first. Therefore, I recommend that you read my previous articles if you haven't already, because next I will be addressing concepts such as reserved memory, allocated memory, and system limitation memories that I talked about in my previous articles:

Processes and Threads
Windows process is essentially a container in which the code of commands from executable file. It represents a kernel process object, and Windows uses this process object and its associated data structures to store and maintain information about executable code applications. For example, a process has a virtual address space in which its private and public data is stored and into which the executable image and its associated DLLs. Windows uses diagnostic tools to record information about a process's resource usage to keep requests accountable and complete, and to log the process's references to objects operating system in the process descriptor table. Processes operate with a security context called a token that identifies a user account, group account and the privileges assigned to the process.

A process includes one or more threads that actually execute code in the process (technically, it's not processes that run, but threads) and is represented in the system as kernel thread objects. There are several reasons why applications create threads in addition to their original starting thread: 1) processes that have user interface, usually create threads in order to do their work while keeping the main thread responsive to user commands related to data entry and window management; 2) Applications that want to use multiple processors to scale performance, or that want to keep running while threads are stalled waiting for I/O to synchronize, create threads to take advantage of multithreading.

Thread restrictions
In addition to basic information about a thread, including the state of the CPU registers, the priority assigned to the thread, and information about the thread's resource usage, each thread has a portion of the process address space allocated to it, called the stack, which the thread can use as working memory as the program code runs. for passing function parameters, storing local variables and addresses of function results. Thus, to avoid wasting system virtual memory, only part of the stack is initially allocated, or part of it is transferred to the thread, and the rest is simply reserved. Since memory stacks grow in descending order, the system places so-called "guard pages" of memory outside the allocated part of the stack, which ensure automatic allocation additional memory(called stack extension) when you need it. The following illustration shows how the allocated stack area gets deeper and how the guard pages move as the stack expands in the 32-bit address space:

The Portable Executable (PE) structures of executable images determine the amount of address space that is reserved and initially allocated to a thread's stack. By default, the linker reserves 1MB and allocates one page (4KB), but developers can change these values either by changing the PE values when they communicate with their program or by calling the CreateTread function on a separate thread. You can use a utility such as Dumpbin, which comes bundled with Visual Studio to view the settings executable program. Here are the results of running Dumpbin with the /headers option on the executable generated by the new Visual Studio project:

Converting the numbers from hexadecimal system calculus, you can see that the stack reserve size is 1MB and the allocated memory area is 4KB; using new utility from Sysinternals called MMap, you can attach to this process and view its address space, and thereby see the process's originally allocated stack memory page, the guard page, and the rest of the reserved stack memory:

Because each thread consumes a portion of the process's address space, processes have a basic limit on the number of threads they can create, equal to the size of their address space divided by the size of the thread's stack.

Limitations of 32-bit streams
Even if a process had no code or data at all and the entire address space could be used for stacks, a 32-bit process with a default address space of 2 bytes could create a maximum of 2048 threads. Here are the results of Testlimit running on 32-bit Windows with the -t (thread creation) option, confirming the presence of this limitation:

Once again, since some of the address space was already used for code and initial heap memory, not all of the 2GB was available for thread stacks, so the total number of threads created could not reach the theoretical limit of 2048 threads.

I tried running Testlimit with additional option, which gives the application an extended address space, hoping that if it is given more than 2GB of address space (for example, on 32-bit systems, this is achieved by running the application with the /3GB or /USERVA option for Boot.ini, or the equivalent BCD option on Vista and later increaseuserva), it will use it. 32-bit processes are allocated 4GB of address space when running on 64-bit Windows, so how many threads can a 32-bit Testlimit running on 64-bit Windows create? Based on what we've already discussed, the answer should be 4096 (4GB divided by 1MB), but in practice this number is much lower. Here is 32-bit Testlimit running on 64-bit Windows XP:

The reason for this discrepancy lies in the fact that when you run a 32-bit application on 64-bit Windows, it is actually a 64-bit process that executes 64-bit code on behalf of 32-bit threads, and therefore has per-thread memory areas are reserved for 64-bit and 32-bit thread stacks. For a 64-bit stack, 256Kb is reserved (exceptions are OSes released before Vista, in which original size stack of 64-bit threads is 1MB). Since every 32-bit thread starts out in 64-bit mode and the stack size it is allocated at startup is larger than the page size, in most cases you will see that a 64-bit thread's stack is allocated at least 16Kb. Here is an example of 64-bit and 32-bit stacks of a 32-bit stream (the 32-bit stack is labeled "Wow64"):

32-bit Testlimit was able to create 3204 threads on 64-bit Windows, which is explained by the fact that each thread uses 1MB + 256KB of address space for the stack (again, the exception is Windows versions to Vista, where 1MB+ 1MB is used). However, I got a different result when running 32-bit Testlimit on 64-bit Windows 7:

The differences between the results on Windows XP and Windows 7 are due to the more promiscuous nature of the address space allocation scheme in Windows Vista,Address Space Layout Randomization (ASLR), which leads to some fragmentation. Randomization DLL loading, thread stack and allocation dynamic memory, helps improve protection against malware. As you can see in the following snapshot of the VMMap program, in test system There is still 357MB of address space available, but the largest free block is 128KB, which is less than the 1MB required for a 32-bit stack:

As I noted, the developer can override the default stack reserve size. One of possible reasons This may be done to avoid wasting address space when it is known in advance that the thread's stack will always use less than the default 1MB. The Testlimit PE image uses a stack reserve size of 64KB by default, and when you specify the -n option along with the -t option, Testlimit creates threads with 64KB stack sizes. Here is the result of running this utility on a system with 32-bit Windows XP and 256MB RAM (I specifically ran this test on a weak system to highlight this limitation):

It should be noted here that another error occurred, which means that in this situation the cause is not the address space. In fact, 64Kb stacks should provide approximately 32,000 threads (2Gb/64Kb = 32,768). So what limitation appeared in in this case? If you look at the possible candidates, including allocated memory and pool, they do not give any clues in finding the answer to this question, since all these values are below their limits:

We can find the answer in additional information about memory in the kernel debugger, which will show us the required limit associated with the available resident memory, the entire amount of which has been exhausted:

Available resident memory is physical memory allocated for data or code that must reside in RAM. The sizes of the nonpaged pool and nonpaged drivers are calculated independently, as is, for example, the memory reserved in RAM for I/O operations. Each thread has both user-mode stacks, as I've already mentioned, but they also have a privileged-mode (kernel-mode) stack, which is used when threads operate in kernel mode, such as executing system calls. When a thread is active, its kernel stack is pinned in memory so that the thread can execute code in the kernel for which required pages cannot be absent.

The base kernel stack takes up 12Kb on 32-bit Windows and 24Kb on 64-bit Windows. 14225 threads require approximately 170MB of resident memory, which is exactly the same as free memory on this system with Testlimit disabled:

Once the limit of available system memory, many basic operations begin to fail. For example, here is the error I got when I double clicked Internet shortcut Explorer located on the desktop:

As expected, running on 64-bit Windows with 256MB of RAM, Testlimit was able to create 6,600 threads - about half as many threads it could create on 32-bit Windows with 256MB of RAM - before running out of available memory:

The reason I previously used the term "base" kernel stack is that the thread that does the graphics and windowing functions gets a "big" stack when it makes the first call, which is equal to (or greater than) 20Kb per 32-bit Windows and 48Kb on 64-bit Windows. Testlimit threads do not call any such API, so they have basic kernel stacks.
Limitations of 64-bit streams

Like 32-bit threads, 64-bit threads have a default 1MB stack reserve, but 64-bit threads have a lot more user address space (8TB), so it shouldn't be a problem when it comes to creating a large number of threads. Yet it is clear that resident available memory is still a potential limiter. The 64-bit version of Testlimit (Testlimit64.exe) was able to create, with and without the -n option, approximately 6600 threads on a system with 64-bit Windows XP and 256MB RAM, exactly the same as the 32-bit version created, because the limit was reached resident available memory. However, on a system with 2GB of RAM, Testlimit64 was only able to create 55,000 threads, which is significantly less than the number of threads that this utility could create if the limitation was the resident available memory (2GB/24KB = 89,000):

In this case, the cause is the thread's allocated initial stack, which causes the system to run out of virtual memory and an error appears due to insufficient paging file capacity. Once the amount of allocated memory reaches the size of RAM, the rate of creation of new threads decreases significantly because the system begins to "slip", previously created thread stacks begin to be paged to the page file to make room for new thread stacks, and the page file has to grow. With the -n option enabled, the results are the same, since the initial amount of allocated stack memory remains the same.

Process Limitations
The number of processes supported by Windows should obviously be less than the number of threads, because each process has one thread and the process itself causes additional resource consumption. 32-bit Testlimit running on a system with 64-bit Windows XP and 2GB of system memory creates about 8400 processes:

If you look at the result of the kernel debugger, it becomes clear that in this case the limit of resident available memory is reached:

If a process were to use resident available memory to accommodate just the privileged mode thread's stack, Testlimit would be able to create many more than 8400 threads on a 2GB system. The amount of resident available memory on this system without Testlimit running is 1.9GB:

By dividing the amount of resident memory used by Testlimit (1.9 GB) by the number of processes it creates, we find that each process has 230 KB of resident memory. Since the 64-bit kernel stack takes up 24 KB, we end up with approximately 206 KB missing per process. Where is the rest of the used resident memory? When a process is created, Windows reserves enough physical memory to provide a minimum working set of pages. This is done to ensure that the process, in any situation, will have enough physical memory at its disposal to store the amount of data necessary to provide a minimum working set of pages. By default, the size of the working set of pages is often 200KB, which can be easily verified by adding the Minimum Working Set column in the Process Explorer window:

The remaining 6Kb is resident accessible memory, allocated for additional nonpageable memory (from the English nonpageable memory), in which the process itself is stored. A process on 32-bit Windows uses slightly less resident memory because its thread privileged stack is smaller.

As with user-mode thread stacks, processes can override their default working set page size by using the SetProcessWorkingSetSize function. Testlimit supports the -n option, which, in combination with the -p option, allows you to set the minimum possible size of the working set of pages to 80Kb for child processes of the main Testlimit process. Because child processes need time to reduce their working page sets, Testlimit, after it can no longer spawn processes, pauses and tries to continue running, giving its child processes a chance to execute. Testlimit, launched with the -n parameter on a system with Windows 7 and 4GB of RAM, has a different limit from the resident available memory limit - the limit on the allocated system memory:

In the screenshot below, you can see that the kernel debugger reports not only that the allocated system memory limit has been reached, but also that, after reaching this limit, thousands of memory allocation errors, both virtual and memory, have occurred. allocated to the paged pool (the allocated system memory limit was actually reached several times, because when an error occurred due to the lack of page file capacity, this same volume increased, pushing back this limit):

Before running Testlimit, the average memory allocation was approximately 1.5GB, so the threads took up about 8GB of allocated memory. Therefore, each process consumed approximately 8 GB/6600 or 1.2 MB. The output of the kernel debugger!vm command, which shows the distribution own memory(from the English private memory) for each process, confirms the correctness of this calculation:

The initial amount of memory allocated to a thread's stack, described earlier, has little impact on the remaining memory allocation requests required for the process's address space data structures, page table entries, handle table entries, process and thread objects, and its own data that the process creates during its initialization.

How many processes and threads will be enough?
Thus, the answers to the questions "how many threads does Windows support?" and "how many processes can you run simultaneously on Windows?" interconnected. Apart from the nuances of the methods by which threads determine their stack size and processes determine their minimum working set of pages, the two main factors that determine the answers to these questions for each specific system, are the amount of physical memory and the limit of allocated system memory. In any case, if an application creates enough threads or processes to approach these limits, then its developer should reconsider the design of that application, since there are always various ways achieve the same result with a reasonable number of processes. For example, a major goal when scaling an application is to keep the number of running threads equal to the number of CPUs, and one way to achieve this is to move from using synchronous I/O to asynchronous using completion ports, which should help keep the number of running threads consistent with the number of CPUs. CPU.

Why does the browser need multiple processes? Multi-process architecture increases security and stability: if a failure occurs somewhere, it will not drag down everything else at once.

In fact, the multi-process technique has been used by other browsers for a long time, and much more aggressively than Firefox. For example, Chrome and all Chromium-based browsers (modern Opera, Yandex.Browser and others) can even show dozens of processes in memory in the task manager if you have many tabs loaded.

There is one serious negative point in this: many processes can put a lot of stress on a weak computer, and if you are used to working with a large number of tabs or have many extensions installed, then even a PC with relatively up-to-date specifications can become strained.

Does Firefox create fewer processes than Chrome?

As we have already said, Mozilla approached the issue of multiple processes much more carefully than Google itself.

Initially, the developers created only one additional process for Firefox, where plugins (not to be confused with extensions) were displayed - plugin-container.exe. Thus, Firefox has 2 processes for the first time.

However, time passed and required the company to not be inferior to its competitors in terms of stability and security. As a result, the long-tested, full-fledged Firefox multi-process architecture was completed this year.

Advantage of lower consumption Firefox memory does not lose, even if it uses its multiprocessing to the maximum (8 CP – 8 processes for processing content)

Some users are stable Firefox versions For the first time, we were able to evaluate multiprocessing this summer, starting with Firefox 54. The final stage here was the autumn release of Firefox 57, which no longer supported . Some of these extensions could previously block multi-process mode, forcing Firefox to use only one process.

However, with processes in Firefox, things are still not the same as in Chrome. If Google's brainchild runs literally everything and everyone in separate processes (every tab, every extension), then Firefox breaks various elements into groups. As a result, there are not as many processes as the main competitor.

This results in noticeably lower memory consumption and, in some cases, lower CPU load. After all, a huge number of processes in Chromium browsers can load even the weakest processor. But Mozilla eventually came to a compromise and, in our opinion, the most reasonable solution.

Additionally, Firefox uses a different on-demand tabs mechanism than Chrome and Chromium-based browsers.

If these web browsers automatically sequentially load tabs from the previous session in the background, then the “fire fox” does this only when explicitly accessing (clicking) on the tab, thereby not creating unnecessary processes when they are not needed. This also contributes to less resource consumption.

How to reduce the number of Firefox processes?

IN Google difference Mozilla practically allows the user to control how many memory processes the browser uses.

You see several firefox.exe processes (or firefox.exe *32 in the case of using 32-bit versions) hanging in the task manager and want to remove/disable them - no problem. Open Settings, scroll down the “General” section, reaching the “Performance” subsection:

If you uncheck the "Use recommended performance settings" option, you will be presented with a setting for the number of content processes.

There are options from 1 to 7 processes to choose from (if you have more than 8 GB of memory, then more than 7 processes may be offered):

At this point it is worth making several important clarifications.

Firstly, we are talking about processes for processing content. If you specify here, for example, only 1 process, then the total number of processes in memory will decrease, but you still won’t get only one copy of firefox.exe, because in addition to the content, Firefox also outputs interface processing to separate processes.

Secondly, reducing the number of processes makes sense on computers with small volume"RAM" and extremely weak hardware. In turn, on more or less acceptable hardware, multiprocessing will not worsen performance, but, on the contrary, will contribute to it, albeit at the cost of increased memory consumption.

Is there any benefit to reducing the number of processes?

If we talk about our own example, for a PC with 8 GB of RAM, 4 content processing processes were initially proposed. At the same time, up to 7 processes could be displayed in memory when opening a large number of tabs.

When we set the number of content processes to 1, restarted the browser and re-clicked on all the tabs to load them, predictably only 4 processes remained in memory.

Of these, 3 are intended for the browser itself and 1 process is specifically for processing content, and the latter is easy to distinguish, because when you open a decent number of tabs, it begins to take up much more memory than the others:

In Firefox, we had 15 different sites open. In the original mode (7 processes), the total memory consumption was about 1.5 GB. When there were only four processes left, in total they took about 1.4 GB (see screenshots above).

We repeated the experiment several times, each of which the “gain” of RAM was only 100-150 MB. It is worth considering that browser performance could be reduced by switching to 1 process for content. Thus, as you can see, the point of reducing the number of processes is very small.

At all, if you can don't lift Apache, do not do that. Consider whether lighttpd or thttpd can perform the tasks you need. These web servers can come in very handy in situations where system resources There’s not enough for everyone, but it has to work. I repeat once again: we are talking about those situations when the functionality of these products will be sufficient to complete the assigned tasks (by the way, lighttpd knows how to work with PHP). In those situations where without Apache well, there’s simply no way around it, you can usually free up a lot of system resources anyway by redirecting requests to static content (JavaScript, graphics) from Apache to a lightweight HTTP server. The biggest problem Apache is his great appetite for RAM. In this article I will look at methods that help speed up work and reduce the amount of memory it takes up:

processing smaller number parallel queries;

circulation processes;

using not too long KeepAlives;

reducing timeout;

reducing logging intensity;

disabling host name resolution;

disabling use .htaccess.

Loading fewer modules

The first step is to get rid of unnecessary modules from loading. Review the config files and determine which modules you are loading. Do you need all the downloadable modules? Find something that is not used and turn it off, this will save you some memory.

Process fewer concurrent requests

The more processes Apache allowed to run simultaneously, the more simultaneous requests he can process it. By increasing this number, you thereby increase the amount of memory allocated for Apache. Using top, you can see that each process Apache It takes up very little memory because shared libraries are used. IN Debian 5 With Apache 2 The default configuration is this:

StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 20 MaxRequestsPerChild 0

Directive StartServers determines the number of server processes that are launched initially, immediately after its start. Directives MinSpareServers And MaxSpareServers determine the minimum and maximum number of child “spare” processes Apache. Such processes wait for incoming requests and are not unloaded, which makes it possible to speed up the server's response to new requests. Directive MaxClients defines the maximum number of parallel requests simultaneously processed by the server. When the number of concurrent connections exceeds this number, new connections will be queued for processing. In fact, the directive MaxClients and determines the maximum allowed number of child processes Apache,launched simultaneously. Directive MaxRequestsPerChild defines the number of requests that the child process must process Apache before ending its existence. If the value of this directive is set to zero, then the process will not "expire".

For your home server, with the corresponding needs, I corrected the configuration to the following:

StartServers 1 MinSpareServers 1 MaxSpareServers 1 MaxClients 5 MaxRequestsPerChild 300

Of course, the above configuration is completely unsuitable for use on highly loaded servers, but for home, in my opinion, it’s just right.

Process circulation

As you can see, I changed the value of the directive MaxRequestsPerChild. By limiting the lifetime of child processes in this way by the number of requests processed, you can avoid accidental memory leaks caused by poorly written scripts.

Using KeepAlives that are not too long

KeepAlive is a method of support permanent connection between client and server. Initially HTTP protocol was designed not to be oriented towards persistent connections. That is, when a web page is sent to a client, all its parts (images, frames, JavaScript) are transmitted using various, separately established connections. With the advent KeepAlive, browsers now have the ability to request a persistent connection and, having established it, download data using one established connection. This method gives a significant increase in productivity. However Apache By default, it uses a too long timeout before closing the connection, equal to 15 seconds. This means that after all the content has been served to the client who requested KeepAlive, the child process will wait for another 15 seconds for incoming requests. It's a bit much, though. It is better to reduce this timeout to 2-3 seconds.

KeepAliveTimeout 2

Decrease timeout

Alternatively, you can reduce the value of the directive TimeOut, which specifies the time to wait for individual requests to complete. By default its value is 300 , perhaps in your case it will make sense to reduce/increase this value. I personally left it as is for now.

Reducing logging intensity

On the way to increasing server performance, you can try reducing the intensity of logging. Modules such as mod_rewrite, can write debugging information to the log, and if you don’t need it, turn off its output.

Disabling Hostname Resolution

In my opinion, there is no need to reverse resolve IP addresses to hostnames. If you really need them so much when analyzing logs, then you can determine them at the analysis stage, and not while the server is running. The directive is responsible for resolving hostnames HostnameLookups, which is actually installed by default in Off, however check this if you really think you need to disable the conversion.

HostnameLookups Off

Disabling use of .htaccess

File processing .htaccess performed Apache every time you request data. Not only that Apache must download this file, it still takes a lot of time and resources to process it. Take a look at your web server and reconsider the need for files .htaccess. If you need various settings for different directories, maybe it would be realistic to put them in the main server configuration file? And disable processing .htaccess possible by directive in the server configuration.

Php-cgi processes eat up memory, multiplying exponentially and do not want to die after the FcgidMaxRequestsPerProcess limit has expired, after which php-cgi actively begins to dump everything into swap and the system begins to issue “502 Bad Gateway”.

To limit the number of forked php-cgi processes, it is not enough to set FcgidMaxRequestsPerProcess , after processing which processes should die, but they do not always do this voluntarily.

The situation is painfully familiar when php-cgi processes ( children) they breed, eating away their memory, but you can’t force them to die - the s.ki want to live! :) Reminds me of the problem with the overpopulation of the earth by “people” - isn’t it?;)

The eternal imbalance between ancestors and children can be resolved by limiting the number of php-cgi children and their lifetime ( genocide) and control of their reproduction activity ( contraception).

Limiting the number of php-cgi processes for mod_fcgid

The directives below probably play the most main role V limiting the number of php-cgi processes and in most cases the default values given here are harmful for servers with RAM below 5-10 GB:

FcgidMaxProcesses 1000- the maximum number of processes that can be active simultaneously;
FcgidMaxProcessesPerClass 100- maximum number of processes in one class ( segment), i.e. the maximum number of processes that are allowed to be spawned through the same wrapper ( wrapper - wrapper);
FcgidMinProcessesPerClass 3 - minimal amount processes in one class ( segment), i.e. minimum number of processes launched through the same wrapper ( wrapper - wrapper), which will be available after all requests are completed;
FcgidMaxRequestsPerProcess 0- FastCGI should "play in the box" after executing this number of requests.

Which number of php-cgi processes will be the most optimal? To determine the optimal number of php-cgi processes you need (pub) Register on our website! :)(/pub)(reg)take into account the total amount of RAM and the size of memory allocated for PHP in memory_limit ( php.ini), which can be consumed by each of the php-cgi processes when executing a PHP script. So, for example, if we have 512 MB, 150-200 of which are allocated for the OS itself, another 50-100 for the database server, mail MTA, etc., and memory_limit=64, then in this case our 200-250 MB RAM, we can run 3-4 php-cgi processes simultaneously without much damage.(/reg)

PHP-cgi process life timeout settings

With the active reproduction of php-cgi children, eating RAM, they can live almost forever, and this is fraught with cataclysms. Below is a list of GMO directives that will help reduce lifetime for php-cgi processes and timely release the resources they occupy:

FcgidIOTimeout 40- time ( per second) during which the mod_fcgid module will try to execute the script.
FcgidProcessLifeTime 3600- if the process exists longer than this time ( in seconds), then it will have to be marked for destruction during the next process scan, the interval of which is specified in the FcgidIdleScanInterval directive;
FcgidIdleTimeout 300- if the number of processes exceeds FcgidMinProcessesPerClass , then the process that does not process requests during this time (in seconds) will be marked for killing during the next process scan, the interval of which is set in the FcgidIdleScanInterval directive;
FcgidIdleScanInterval 120- the interval through which the mod_fcgid module will search for processes that have exceeded the FcgidIdleTimeout or FcgidProcessLifeTime limits.
FcgidBusyTimeout 300- if the process is busy processing requests beyond this time ( per second), then during the next scan, the interval of which is specified in FcgidBusyScanInterval , such a process will be marked for killing;
FcgidBusyScanInterval 120- interval at which scanning and searching are performed busy processes exceeded the limit FcgidBusyTimeout ;
FcgidErrorScanInterval 3- interval ( per second), through which the mod_fcgid module will kill processes awaiting completion, incl. and those that have exceeded FcgidIdleTimeout or FcgidProcessLifeTime . Killing occurs by sending the SIGTERM signal to the process, and if the process remains active, it is killed by the SIGKILL signal.

It must be taken into account that a process that has exceeded FcgidIdleTimeout or FcgidBusyTimeout can live + another FcgidIdleScanInterval or FcgidBusyScanInterval time, after which it will be marked for destruction.

It is better to set ScanIntervals with a difference of several seconds, for example, if FcgidIdleScanInterval 120, then FcgidBusyScanInterval 117 - i.e. so that processes are not scanned at the same time.

PHP-cgi process spawning activity

If none of the above helped, which is surprising, then you can also try to play tricks with the activity of spawning php-cgi processes...

In addition to the limits on the number of requests, php-cgi processes and their lifetime, there is also such a thing as the activity of spawning child processes, which can be regulated by such directives as FcgidSpawnScore, FcgidTerminationScore, FcgidTimeScore and FcgidSpawnScoreUpLimit, the translation of which from bourgeois I think I gave the correct translation ( default values are shown):FcgidSpawnScoreUpLimit , no child processes of the application will be spawned, and all spawn requests will have to wait until an existing process becomes free or until the score ( Score) does not fall below this limit.

If my translation of the description and understanding of the above parameters is correct, then to reduce the activity of php-cgi process spawning, you should lower the value of the FcgidSpawnScoreUpLimit directive or increase the values of FcgidSpawnScore and FcgidTerminationScore.

Results

I hope I have listed and discussed in detail most of the mod_fcgid module directives that will help limit the number of php-cgi and their lifetime, and also reduce resource consumption. Below is the complete mod_fcgid configuration for a successfully working server with a 2500 MHz processor and 512 MB RAM:

Oleg Golovsky