Solaris · System Monitoring

Solaris · Lesson 12

System Monitoring

Monitor CPU, memory and disk in Solaris. Performance tools overview. Identify bottlenecks. Log file analysis.

Why system monitoring matters

System monitoring is about answering a few key questions quickly: Is the system healthy? What is slow? Which component is the bottleneck? As a Solaris admin, you will use a combination of CPU, memory, disk and log tools to build this picture.

In this lesson we focus on tools you listed: top, prstat, iostat, df, du, uptime and system logs.

Key monitoring dimensions

Load

uptime and top give you a quick summary of load averages, logged-in users and overall system stress.

CPU & memory

prstat shows per-process and per-user CPU/memory usage and lets you sort or filter quickly.

Disk & I/O

iostat tells you if disks and controllers are saturated or error-prone, which often explains slow I/O.

Space & logs

df, du and logs show you where space is used and what errors the system is reporting.

Step-by-step monitoring commands

Walk through these flows in your lab. Open two terminals: one to run monitoring commands, another to start/stop test workloads and see their impact.

1. Overall system load: uptime, top, prstat -a

First get a quick feel of system load and active processes using uptime, top and prstat.

terminal — monitoring
solaris-lab
[root@solaris ~]# uptime
11:15am up 10 day(s), 3:42, 4 users, load average: 0.24, 0.18, 0.15
 
[root@solaris ~]# top
last pid: 1234; load averages: 0.24, 0.18, 0.15 up 10+03:42:10 11:15:32
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
891 oracle 1 59 0 120M 40M sleep 0:02 1.0% bash
950 root 35 59 0 150M 60M sleep 0:05 0.5% java
123 root 1 59 0 60M 20M sleep 0:02 0.2% sshd
...
 
[root@solaris ~]# prstat -a 1 3
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
891 oracle 120M 40M sleep 59 0 0:00:02 1.0% bash/1
950 root 150M 60M sleep 59 0 0:00:05 0.5% java/35
123 root 60M 20M sleep 59 0 0:00:02 0.2% sshd/1
...
 
Total: 45 processes, 125 lwps, load averages: 0.24, 0.18, 0.15

2. Memory usage: prstat sorted by RSS and ::memstat

Use prstat -a -s rss to see which processes consume most memory, and ::memstat in mdb -k for a kernel-level view.

terminal — monitoring
solaris-lab
[root@solaris ~]# prstat -a -s rss 1 3
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
950 root 800M 600M sleep 59 0 0:00:30 2.0% java/35
1200 oracle 500M 300M sleep 59 0 0:00:20 1.5% oracle/50
891 oracle 120M 40M sleep 59 0 0:00:02 0.5% bash/1
...
 
Total: 45 processes, 125 lwps, load averages: 0.30, 0.22, 0.18
 
[root@solaris ~]# echo ::memstat | mdb -k
Page Summary Pages MB %Tot
------------ ----- ---- ----
Kernel 150000 1171 35%
ZFS File Data 100000 781 23%
Anon 120000 937 28%
Exec and libs 20000 156 5%
Page cache 15000 117 4%
Free (unallocated) 15000 117 4%
 
Total 420000 3280 100%

3. CPU usage by user: prstat -a -u user and prstat -t

Filter prstat output for a specific user, or aggregate usage per user with prstat -t.

terminal — monitoring
solaris-lab
[root@solaris ~]# prstat -a -u oracle 1 3
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
1200 oracle 500M 300M sleep 59 0 0:00:20 2.0% oracle/50
891 oracle 120M 40M sleep 59 0 0:00:02 0.5% bash/1
...
 
Total: 10 processes, 60 lwps, load averages: 0.40, 0.30, 0.20
 
[root@solaris ~]# prstat -t 1 3
USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
oracle 700M 340M sleep 59 0 0:00:25 2.5%
root 600M 300M sleep 59 0 0:00:20 1.5%
webuser 200M 100M sleep 59 0 0:00:05 0.5%
 
Total: 45 processes, 125 lwps, load averages: 0.45, 0.35, 0.25

4. Disk I/O: iostat -en and iostat -xnte

Use iostat to see device errors, throughput and extended statistics for disks and controllers.

terminal — monitoring
solaris-lab
[root@solaris ~]# iostat -en
--- errors --- --- transport --- --- device ---
cmdk0 0 0 0 100.00% 0.00% 0.00% 0.00%
cmdk1 0 0 0 100.00% 0.00% 0.00% 0.00%
 
[root@solaris ~]# iostat -xnte 1 3
extended device statistics
device r/s w/s kr/s kw/s wait actv svc_t %w %b
cmdk0 5.0 2.0 40.00 20.00 0.0 0.1 10.0 0 10
cmdk1 1.0 4.0 10.00 30.00 0.0 0.1 15.0 0 12
...

5. Filesystem usage: du and df

Check filesystem and directory usage to find where space is being used.

terminal — monitoring
solaris-lab
[root@solaris ~]# df -h
Filesystem Size Used Available Capacity Mounted on
rpool/ROOT/solaris 40G 10G 30G 25% /
rpool/export/home 60G 25G 35G 42% /export/home
rpool/data 80G 55G 25G 69% /data
 
[root@solaris ~]# cd /data
[root@solaris /data]# du -sh
55G .
 
[root@solaris /data]# du -sh *
10G oracle
20G backups
5G logs
20G appdata

6. System logs: quick checks for issues

Use tail and grep to scan important logs when you notice high load or errors.

terminal — monitoring
solaris-lab
[root@solaris ~]# tail -50 /var/adm/messages
Jan 11 11:10:02 sol11 sshd[1234]: [ID 800047 auth.info] Accepted publickey for oracle from 192.168.1.10 port 53218 ssh2
Jan 11 11:11:45 sol11 nfs: [ID 702911 kern.warning] WARNING: NFS server not responding
Jan 11 11:11:50 sol11 nfs: [ID 702911 kern.notice] NOTICE: NFS server ok
 
[root@solaris ~]# grep -i "error" /var/adm/messages | tail -10
Jan 11 10:55:21 sol11 someapp[987]: [ID 702911 user.error] Failed to connect to database
 
[root@solaris ~]# tail -f /var/adm/messages
# Follow the log live while reproducing an issue...

Typical monitoring patterns

When users say “system is slow”

  • Check uptime to see load averages and how long the system has been up.
  • Use prstat -a to see which processes are consuming CPU.
  • Check memory pressure with prstat -a -s rss and ::memstat.
  • Run iostat -xnte to see if disks are saturated or busy.
  • Review recent errors in /var/adm/messages.

When disk space is filling up

  • Use df -h to find which filesystem is close to 100% usage.
  • cd into that filesystem and run du -sh and du -sh * to see which directories are heavy.
  • Look for large log or backup directories and discuss cleanup or rotation with application owners.
  • Consider ZFS snapshots, compression or moving rarely-used data to a different pool.

Practice task – build a quick health-check routine

  • Create your own 5–10 command sequence (using uptime, prstat, iostat, df, du and logs) that you will always run when investigating a performance issue.
  • Run a stress script (CPU or disk heavy) in the background and observe how the metrics change in real time.
  • Note down what “normal” looks like for your lab VM so you can quickly recognise unusual patterns later on bigger systems.

In future lessons on ZFS and patching, these monitoring tools will help you verify the impact of your changes.