-
LINUX > débugger les erreur matériels
- Enable the
debuginfo-pool
anddebuginfo-update
repositories - Install vim-debuginfo
- Launch
vim testfile
and type a few characters - Get the PID and generate a segfault:
tux >
ps ax | grep vim 2345 pts/3 S+ 0:00 vim testfileroot #
kill -s SIGSEGV 2345Vim will emit error messages:
Vim: Caught deadly signal SEGV Vim: Finished. Segmentation fault (core dumped)
- List your core dumps, then examine them:
root #
coredumpctl TIME PID UID GID SIG PRESENT EXE Wed 2019-11-12 11:56:47 PST 2345 1000 100 11 * /bin/vimroot #
coredumpctl info PID: 2345 (vim) UID: 0 (root) GID: 0 (root) Signal: 11 (SEGV) Timestamp: Wed 2019-11-12 11:58:05 PST Command Line: vim testfile Executable: /bin/vim Control Group: /user.slice/user-1000.slice/session-1.scope Unit: session-1.scope Slice: user-1000.slice Session: 1 Owner UID: 1000 (tux) Boot ID: b5c251b86ab34674a2222cef102c0c88 Machine ID: b43c44a64696799b985cafd95dc1b698 Hostname: linux-uoch Coredump: /var/lib/systemd/coredump/core.vim.0.b5c251b86ab34674a2222cef102 Message: Process 2345 (vim) of user 0 dumped core. Stack trace of thread 2345: #0 0x00007f21dd87e2a7 kill (libc.so.6) #1 0x000000000050cb35 may_core_dump (vim) #2 0x00007f21ddbfec70 __restore_rt (libpthread.so.0) #3 0x00007f21dd92ea33 __select (libc.so.6) #4 0x000000000050b4e3 RealWaitForChar (vim) #5 0x000000000050b86b mch_inchar (vim) [...] Storage=none
—core dumps are logged in the journal, but not stored. This is useful to minimize collecting and storing sensitive information, for example for General Data Protection Regulation (GDPR) compliance.Storage=external
—cores are stored in/var/lib/systemd/coredump
Storage=journal
—cores are stored in thesystemd
journal- 0 means to “Estimate if we have enough RAM”
- 1 equals to “Always allow”
- 2 that is used here tells the kernel to “Say no if the system doesn’t have the memory”
- Select a larger preconfigured instance from the Configuration drop-down menu.
- Select the Custom configuration from the same box and then use the slider underneath.
systemd-coredump
systemd-coredump
collects and displays kernel core dumps, for analyzing application crashes. When a process crashes (or all processes belonging to an application), its default is to log the core dump to thesystemd
journal, including a backtrace if possible, and to store the core dump in a file in/var/lib/systemd/coredump
. You also have the option to examine the dump file with other tools such asgdb
orcrash
(see Section 17.8, “Analyzing the Crash Dump”). There is an option to not store core dumps, but to log only to the journal, which may be useful to minimize the collection and storage of sensitive information.18.1 Use and Configuration REPORT DOCUMENTATION BUG#EDIT SOURCE
systemd-coredump
is enabled and ready to run by default. The default configuration is in/etc/systemd/coredump.conf
:[Coredump] #Storage=external #Compress=yes #ProcessSizeMax=2G #ExternalSizeMax=2G #JournalSizeMax=767M #MaxUse= #KeepFree=
The following example shows how to use Vim for simple testing, by creating a segfault to generate journal entries and a core dump.
PROCEDURE 18.1: CREATING A CORE DUMP WITH VIM REPORT DOCUMENTATION BUG#
When you have multiple core dumps,
coredumpctl info
displays all of them. Filter them byPID
,COMM
(command), orEXE
(full path to the executable), for example, all core dumps for Vim:root #
coredumpctl info /bin/vimSee a single core dump by
PID
:root #
coredumpctl info 2345Output the selected core to
gdb
:root #
coredumpctl gdb 2345The asterisk in the
PRESENT
column indicates that a stored core dump is present. If the field is empty there is no stored core dump, andcoredumpctl
retrieves crash information from the journal. You can control this behavior in/etc/systemd/coredump.conf
with theStorage
option:A new instance of
systemd-coredump
is invoked for every core dump, so configuration changes are applied with the next core dump, and there is no need to restart any services.Core dumps are not preserved after a system restart. You may save them permanently with
coredumpctl
. The following example filters by thePID
and stores the core invim.dump
:root #
coredumpctl -o vim.dump dump 2345See
man systemd-coredump
,man coredumpctl
, andman coredump.conf
for complete command and option listings.—
sudo apt install rasdaemon
—
How to troubleshoot Linux server memory issues
Some unexpected behaviour on the server side may at times be caused by system resource limitations. Linux by its design aims to use all of the available physical memory as efficiently as possible, in practice, the Linux kernel follows a basic rule that a page of free RAM is wasted RAM. The system holds a lot more in RAM than just application data, most importantly mirrored data from storage drives for faster access. This debugging guide aims to explain how to identify how much of the resources are actually being used, and how to recognise real resource outage issues.
Process stopped unexpectedly
Suddenly killed tasks are often the result of the system running out of memory, which is when the so-called Out-of-memory (OOM) killer steps in. If a task gets killed to save memory, it gets logged into various log files stored at /var/log/
You can search the logs for messages of out of memory alerts.
sudo grep -i -r 'out of memory' /var/log/
Grep goes through all logs under the directory and therefore will show at least the just ran command itself from the /var/log/auth.log. Actual log marks of OOM killed processes would look something like the following.
kernel: Out of memory: Kill process 9163 (mysqld) score 511 or sacrifice child
The log note here shows the process killed was mysqld with pid 9163 and OOM score of 511 at the time it was killed. Your log messages may vary depending on Linux distribution and system configuration.
If for example a process crucial to your web application was killed as a result of out of memory situation, you have a couple of options, reduce the amount of memory asked by the process, disallow processes to overcommit memory, or simply add more memory to your server configuration.
Current resource usage
Linux comes with a few handy tools for tracking processes that can help with identifying possible resource outages. You can track memory usage for example with the command below.
free -h
The command prints out current memory statistics, for example in 1 GB system the output is something along the lines of the example underneath.
total used free shared buffers cached Mem: 993M 738M 255M 5.7M 64M 439M -/+ buffers/cache: 234M 759M Swap: 0B 0B 0B
Here it is important to make the distinction between application used memory, buffers and caches. On the Mem line of the output it would appear nearly 75% of our RAM is in use, but then again over half of the used memory is occupied by cached data.
The difference is that while applications reserve memory for their own use, the cache is simply commonly used hard drive data that the kernel stores temporarily in RAM space for faster access, which on the application level is considered free memory.
Keeping that in mind, it’s easier to understand why used and free memory are listed twice, on the second line is conveniently calculated the actual memory usage when taken into account the amount of memory occupied by buffers and cache.
In this example, the system is using merely 234MB of the total available 993MB, and no process is an immediate danger of being killed to save resources.
Another useful tool for memory monitoring is ‘top’, which displays useful continuously updated information about processes’ memory and CPU usage, runtime and other statistics. This is particularly useful for identifying resource exhaustive tasks.
top
You can scroll the list using Page Up and Page Down buttons on your keyboard. The program runs on the foreground until cancelled by pressing ‘q’ to quit. The resource usage is shown in percentages and gives an easy overview of your system’s workload.
top - 17:33:10 up 6 days, 1:22, 2 users, load average: 0.00, 0.01, 0.05 Tasks: 72 total, 2 running, 70 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 1017800 total, 722776 used, 295024 free, 66264 buffers KiB Swap: 0 total, 0 used, 0 free. 484748 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 20 0 33448 2784 1448 S 0.0 0.3 0:02.91 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:00.02 ksoftirqd/0 5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H 6 root 20 0 0 0 0 S 0.0 0.0 0:01.92 kworker/u2:0 7 root 20 0 0 0 0 S 0.0 0.0 0:05.48 rcu_sched
In the example output shown above, the system is idle and the memory usage is nominal.
Check if your process is at risk
If your server’s memory gets used up to the extent at which it can threaten system stability, the Out-of-memory killer will choose which process to eliminate based on many variables such as the amount of work done that would be lost and total memory freed. Linux keeps a score for each running process, which represents the likelihood at which the process would be killed in OOM situation.
This score is stored on file in /proc/<pid>/oom_score, where pid is the identification number for the process you are looking into. The pid can be easily found using the following command.
ps aux | grep <process name>
The output of the command when searching for mysql, for example, would be similar to the example below.
mysql 5872 0.0 5.0 623912 51236 ? Ssl Jul16 2:42 /usr/sbin/mysqld
Here the process ID is the first number on the row, 5872 in this case, which then can be used to get further information on this particular task.
cat /proc/5872/oom_score
The readout of this gives us a single numerical value for the chance of the process getting axed by the OOM killer. The higher the number the more likely the task is to be chosen if an out of memory situation should arise.
If your important process has a very high OOM score, it is possible the process is wasting memory and should be looked into. However just high OOM score, if the memory usage otherwise remains nominal, is no reason for concern. OOM killer can be disabled, but this is not recommended as it might cause unhandled exceptions in out of memory situations, possibly leading to a kernel panic or even a system halt.
Disable over commit
In major Linux distributions, the kernel allows by default for processes to request more memory than is currently free in the system to improve the memory utilization. This is based on heuristics that the processes never truly use all the memory they request. However, if your system is at risk of running out of memory, and you wish to prevent losing tasks to OOM killer, it is possible to disallow memory overcommit.
To change how the system handles overcommit calls Linux has an application called ‘sysctl’ that is used to modify kernel parameters at runtime. You can list all sysctl controlled parameters using the following.
sudo sysctl -a
The particular parameters that control memory are very imaginatively named vm.overcommit_memory and vm.overcommit_ratio. To change the overcommit mode, use the below command.
sudo sysctl -w vm.overcommit_memory=2
This parameter has 3 different values:
The important part of changing the overcommit mode is to remember to also change the overcommit_ratio. When overcommit_memory is set to 2, the committed address space is not permitted to exceed swap space plus this percentage of physical RAM. To be able to use all of the system’s memory use the next command.
sudo sysctl -w vm.overcommit_ratio=100
These changes are applied immediately but will only persist until the next system reboot. To have the changes remain permanent, the same parameter values need to be added to sysctl.conf –file. Open the configuration file for edit.
sudo nano /etc/sysctl.conf
Add the same lines to the end of the file.
vm.overcommit_memory=2 vm.overcommit_ratio=100
Save the changes (ctrl + O) and exit (ctrl + X) the editor. Your server will read the configurations every time at boot up, and prevent applications from overcommitting memory.
Add more memory to your server
The safest and most futureproof option for solving out of memory issues is adding more memory to your system. In a traditional server environment you would need to order new memory modules, wait for them to arrive, and install them to your system, but with cloud servers, all you have to do is increase the amount of RAM you wish to have available at your UpCloud control panel.
Log in to your UpCloud control panel, browse to the Server Listing and open your server’s details by clicking on its description. In the Server General Settings tab, there is a section on the right name CPU and Memory Settings. While your server is running, you will notice that these options are greyed out, this is because they may only be safely changed while the server is shut down.
Proceed by turning off your server with the Shutdown request option on the left of the same page, and click OK in the confirmation dialogue. It will usually take a moment for the server to shut down completely, but once it has the CPU and Memory Settings will become available without you having to refresh the page.
Now you will have two options to increase the amount of RAM in your system:
The slider allows you to select a value in increments of 1GB to change the RAM to the desired configuration. Changing your server’s configuration also affect the pricing of your server. To see the corresponding prices to each preconfigured option or custom configuration, check the server configuration options at new server deployment.
Once you’ve selected the new server configuration, simply press the Update button on the right and the changes will be made immediately. Then you can start your server again with the increased RAM.
If you selected a larger preconfigured option, refer to our resizing storage guide on how to allocate the newly added disk space.
—
—
—
—
- Enable the
32 thoughts on “How to troubleshoot Linux server memory issues”
strange thing is that am not getting any process using more memory by using top nmon sar ,vmstat all mem comands shows no utilization on memory . its almost less than 10 % .
any help on this , please
-/+ buffer /cache used 11G, free 19 G
Swap total 7.8 G, used 54 M, free 7.8 G
Do I need to be concerned here?
I tried with multiple options in ps command but no luck
Is any option we can get it in top command or ps , can you please advise.
Thanks
For one of my RHEL installation, using vmstat, shows that Cache value just goes down eventually to 0. With the same application running on my other server instance, runs fine with Cache value being constant to 23 GB.
Both of them runs 32GB of RAM.
total used free shared buffers cached
Mem: 404 339 65 0 0 193
-/+ buffers/cache: 145 259
Swap: 9 0 9
but sorry for the question i have memory consuming by very bad way
if need any screenshots tell me to investigate with me for this problem
iam very thankful again to you
i found the 5 top process consuming the ram
41.6 6.8 210121560 9523 ora_dbw3_
41.5 6.8 210175192 9521 ora_dbw2_
41.5 6.8 210117144 9517 ora_dbw0_
41.5 6.7 210143064 9519 ora_dbw1_
that’s refer to DB writer i think is up normal
please give advice
sure its my pleasurer you can contact me on my mail ahmed.lgohary.am@gmail.com
sudo sysctl -p
(from serverfault.com)
Now without restarting the maching, your changes are executed.
Thanks!
total used free shared buff/cache available
Mem: 31 2 0 26 28 2
Swap: 62 0 62
Hi Joey, thanks for the comment. While the shared memory usage shouldn’t be a concern, you can identify the processes that are using it with the following command: