Introduction to Linux Log Analysis: Tools for System Fault Diagnosis

I. What are Linux Logs? Why Are They Important?¶

In a Linux system, logs are like a “system diary” that record every event during system operation. From system boot, service execution, user logins to errors, all critical operations and anomalies are “documented” in log files. When the system fails (e.g., services won’t start, network issues, data loss), logs serve as a “clue library” for troubleshooting problems.

Example: If your website suddenly becomes unavailable, analyzing the Web server logs may reveal “404 errors,” “permission denied,” or “database connection failures,” enabling quick root cause identification.

II. Common Linux Log Files and Their Roles¶

Linux logs are managed by specialized services (e.g., rsyslogd or systemd-journald) and stored by default in the /var/log directory. Here are core log files beginners should know:

1. `/var/log/messages` (System Main Log)¶

Purpose: Records most routine system events, including boot processes, service start/stop, kernel messages, and software errors.
Typical Scenarios:
Kernel hardware initialization messages during system boot;
Error messages from software installations or startup failures;
Key status updates for services (e.g., Apache, MySQL).
Example:

  Sep 10 12:34:56 server kernel: [   10.23] EXT4-fs error (device sda1): ext4_mb_read_super: Bad magic number in super-block while trying to open /dev/sda1

This log indicates a “corrupted superblock on /dev/sda1,” potentially causing disk mount failure.

2. `/var/log/auth.log` (Authentication & Security Log)¶

Purpose: Logs all user authentication-related events, including login attempts, password verification, and permission changes.
Typical Scenarios:
Successful/failed login attempts (e.g., “Failed password”);
Permission verification during su user switching;
Firewall rule changes and sudo operations.
Example:

  Sep 10 14:20:15 server sshd[1234]: Failed password for root from 192.168.1.100 port 54321 ssh2

This log shows “root user failed to log in due to incorrect password from IP 192.168.1.100.”

3. `/var/log/dmesg` (Kernel Messages Log)¶

Purpose: Records hardware initialization info during system boot and kernel errors (e.g., driver load failures, hardware issues).
Typical Scenarios:
Checking hardware status at boot (e.g., network card, disk recognition);
Detailed stack traces of kernel panics;
Error messages from failed hardware driver loading.
Example:

  Sep 10 09:00:00 server kernel: [    5.67] usb 1-2: device not found

This log indicates “USB device not found,” possibly due to hardware connection issues or unloaded drivers.

4. Application-Specific Logs¶

Different services have dedicated logs:
- Apache/Nginx: /var/log/apache2/error.log or /var/log/nginx/error.log (web service errors);
- MySQL/MariaDB: /var/log/mysql/error.log (database startup/connection errors);
- System Login: /var/log/btmp (records failed login IPs/users, view with lastb).

III. Common Commands for Viewing Logs¶

Linux offers tools to inspect logs; focus on these basics:

1. `tail`: Real-Time Log Tailing¶

Real-time monitoring: tail -f /var/log/messages (-f continuously refreshes logs, ideal for service startup or real-time events);
Last 10 lines: tail -n 10 /var/log/auth.log (-n specifies line count);
Example:

  # Real-time system log monitoring
  $ tail -f /var/log/messages
  Sep 10 15:00:00 server CRON[5678]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)

2. `grep`: Filter Logs by Keyword¶

Find “error” entries: grep "error" /var/log/messages (supports partial matching, case-sensitive);
Case-insensitive search: grep -i "error" /var/log/messages (-i ignores case);
Exclude keywords: grep -v "info" /var/log/messages (-v excludes lines with “info”);
Example:

  # Locate MySQL errors quickly
  $ grep "MySQL" /var/log/messages | grep "error"
  Sep 10 16:00:00 server mysqld[1234]: [ERROR] Can't connect to local MySQL server through socket

3. `cat`: View Small Log Files¶

Suitable for quick inspection of small files:

$ cat /var/log/dmesg | head  # View first 10 lines of kernel log

4. `less`: Page Through Large Logs¶

Ideal for large files, supports pagination and search:

$ less /var/log/syslog  # Use PageUp/PageDown to navigate, Q to exit

IV. Troubleshooting Practice: Extracting Clues from Logs¶

The core troubleshooting approach is: Phenomenon → Locate Logs → Identify Keywords → Analyze Root Cause. Here are 3 common scenarios:

Scenario 1: User Cannot Log In¶

Phenomenon: System returns “Login incorrect” or “Permission denied” after correct password input.
Log File: /var/log/auth.log;
Keywords: Failed password (incorrect password), Permission denied (insufficient permissions), invalid user (unauthorized user);
Solutions:
Incorrect password: Verify password (reset with passwd);
Permission issues: Check /etc/passwd for user permissions or sshd_config restrictions;
Unauthorized access: Check /var/log/btmp, block malicious IPs in firewalls.

Scenario 2: Web Service (e.g., Apache) Fails to Start¶

Phenomenon: systemctl start apache2 returns “failed” status.
Log File: var/log/apache2/error.log (or service-specific error log);
Keywords: Address already in use (port conflict), Cannot load modules (module loading failure), Permission denied (insufficient permissions);
Solutions:
Port conflict: Use netstat -tuln to check ports, kill conflicting processes (kill -9 1234);
Module issues: Verify /etc/apache2/mods-enabled for corrupted config files.

Scenario 3: System Stalls or Crashes¶

Phenomenon: System unresponsive or commands hang indefinitely.
Log Files: /var/log/messages (system events) and /var/log/dmesg (kernel info);
Keywords: out of memory (OOM), kernel panic (kernel crash), IO error (disk/IO failure);
Solutions:
Memory issues: Use free -m and top/htop to terminate high-memory processes;
Disk problems: Check dmesg for “IO error” entries to confirm disk corruption.

V. Summary and Advanced Tips¶

Linux log analysis is a “basic skill” for system management, enabling 80% of common issue resolution. Key takeaways:
1. Locate the right log file: Choose messages, auth.log, or service-specific logs based on the problem;
2. Leverage keyword filtering: grep + tail -f is a powerful combination;
3. Pay attention to timestamps: Timestamps help pinpoint exact failure times and narrow down scope.

Advanced Tools:
- journalctl: Systemd’s log tool, e.g., journalctl -u sshd (SSH service logs);
- Log Aggregation: ELK Stack (Elasticsearch+Logstash+Kibana) for enterprise-level centralized analysis.

Practice Suggestion: Set up a test server, simulate failures (e.g., service startup errors, login issues), and use the above commands to analyze logs.

Note: Logs may contain sensitive data (passwords, IPs). Clear or encrypt logs after troubleshooting to prevent leaks.

I. What are Linux Logs? Why Are They Important?¶

II. Common Linux Log Files and Their Roles¶

1. /var/log/messages (System Main Log)¶

2. /var/log/auth.log (Authentication & Security Log)¶

3. /var/log/dmesg (Kernel Messages Log)¶