Skip to content

Commit 041d4d8

Browse files
authored
Merge pull request #134 from nickanderson/CFE-4653/master
Added S.M.A.R.T. inventory
2 parents 27c5a57 + 9e838c5 commit 041d4d8

File tree

4 files changed

+366
-0
lines changed

4 files changed

+366
-0
lines changed

cfbs.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,16 @@
196196
"bundles inventory_fde:main"
197197
]
198198
},
199+
"inventory-smartctl": {
200+
"description": "Inventory SMART drive health, temperature, and wear data.",
201+
"tags": ["inventory", "monitoring", "hardware", "storage"],
202+
"subdirectory": "inventory/inventory-smartctl",
203+
"steps": [
204+
"copy policy.cf services/cfbs/modules/inventory-smartctl/policy.cf",
205+
"policy_files services/cfbs/modules/inventory-smartctl/policy.cf",
206+
"bundles inventory_smartctl:main"
207+
]
208+
},
199209
"library-for-promise-types-in-bash": {
200210
"description": "Library enabling promise types implemented in bash.",
201211
"subdirectory": "libraries/bash",
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
Inventory module for collecting SMART drive health, temperature, and wear data via smartctl.
2+
3+
## Description
4+
5+
This module collects S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) data from storage devices and exposes it as inventory attributes in CFEngine Mission Portal. It monitors drive health status, temperature, power-on hours, and NVMe-specific metrics.
6+
7+
SMART data helps predict drive failures before they occur and provides visibility into storage device health across your infrastructure.
8+
9+
## Requirements
10+
11+
- **Platform:** Linux only (currently)
12+
- **Binary:** `smartctl` from smartmontools package (version 7.0+ for JSON support)
13+
- **Permissions:** Requires root to read SMART data from devices
14+
15+
### Installation
16+
17+
Add to your policy via cfbs:
18+
19+
```bash
20+
cfbs add inventory-smartctl
21+
cfbs install
22+
```
23+
24+
Or include directly in your policy:
25+
26+
```cfengine
27+
bundle agent main
28+
{
29+
methods:
30+
"smartctl" usebundle => inventory_smartctl:main;
31+
}
32+
```
33+
34+
## Inventory Attributes
35+
36+
The following attributes are exposed in Mission Portal:
37+
38+
### Universal Attributes (all drive types)
39+
40+
- **SMART drive health** - Per-drive health status
41+
- Values: `PASSED`, `FAILED`, `SMARTCTL_MISSING`
42+
- Example: `/dev/sda: PASSED`, `/dev/nvme0: FAILED`
43+
- `SMARTCTL_MISSING`: Indicates smartctl is not installed on the system
44+
- Critical: A FAILED status indicates the drive is predicting imminent failure
45+
46+
- **SMART drive model** - Drive model identifier
47+
- Example: `/dev/sda: Samsung SSD 870 EVO`
48+
49+
- **SMART drive temperatures (C)** - Current temperature in Celsius
50+
- Example: `/dev/sda: 35 C`
51+
- Note: Not available for virtual disks
52+
53+
- **SMART drive power-on hours** - Cumulative runtime in hours
54+
- Example: `/dev/sda: 8742 h`
55+
- Useful for tracking drive age and warranty coverage
56+
57+
### NVMe-Specific Attributes
58+
59+
- **SMART NVMe available spare** - Remaining spare blocks (%)
60+
- Example: `/dev/nvme0: 100%`
61+
- Low values (<10%) indicate wear approaching end of life
62+
63+
- **SMART NVMe percentage used** - Drive life consumed (%)
64+
- Example: `/dev/nvme0: 5%`
65+
- Based on manufacturer's endurance rating
66+
67+
- **SMART NVMe media errors** - Uncorrectable media errors count
68+
- Example: `/dev/nvme0: 0`
69+
- Any non-zero value indicates data integrity issues
70+
71+
### Alert Attributes
72+
73+
- **SMART failed drives** - List of drives with FAILED health status
74+
- Only present when one or more drives are failing
75+
- Use for alerting and automated response
76+
77+
## Troubleshooting
78+
79+
### SMARTCTL_MISSING appears in inventory
80+
81+
The module reports `SMARTCTL_MISSING` when smartctl is not installed. To resolve:
82+
83+
**Install smartmontools package:**
84+
85+
```sh
86+
# Debian/Ubuntu
87+
apt-get install smartmontools
88+
89+
# RHEL/CentOS/Fedora
90+
yum install smartmontools
91+
92+
# SUSE
93+
zypper install smartmontools
94+
```
95+
96+
**Verify installation:**
97+
98+
```sh
99+
command -v smartctl
100+
smartctl --version
101+
```
102+
103+
### No inventory data appears
104+
105+
If smartctl is installed but no data appears:
106+
107+
**Check if drives are detected:**
108+
109+
```sh
110+
smartctl --scan
111+
```
112+
113+
**Check cache files:**
114+
115+
```sh
116+
ls -lh /var/cfengine/state/inventory_smartctl_*.json
117+
```
118+
119+
**Run with verbose mode:**
120+
121+
```sh
122+
cf-agent -Kvf ./policy.cf
123+
```
124+
125+
## See Also
126+
127+
- [CFEngine inventory tutorial](https://docs.cfengine.com/docs/lts/examples/tutorials/custom_inventory/)
128+
- [CFEngine Masterfiles inventory policy](https://docs.cfengine.com/docs/lts/reference/masterfiles-policy-framework/inventory/)
129+
- [smartmontools documentation](https://www.smartmontools.org/)
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
{
2+
"name": "inventory-smartctl",
3+
"description": "Inventory SMART drive health, temperature, and wear data",
4+
"tags": ["inventory", "monitoring", "hardware", "storage", "smartctl"],
5+
"version": "0.1.0",
6+
"steps": [
7+
"copy ./policy.cf services/inventory/smartctl.cf"
8+
],
9+
"dependencies": [],
10+
"subdirectory": "inventory/inventory-smartctl"
11+
}
Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
body file control
2+
{
3+
namespace => "inventory_smartctl";
4+
}
5+
6+
bundle agent main
7+
# @brief Inventory SMART drive health, temperature, and wear data via smartctl JSON
8+
#
9+
# Requires smartmontools >= 7.0 (for JSON output support).
10+
# Runs on Linux only; silently no-ops on other platforms.
11+
#
12+
# Simplified version: reads JSON directly in main bundle, no sub-bundle needed.
13+
#
14+
# Attributes exposed in Mission Portal:
15+
# @inventory SMART drive health - Per-drive PASSED/FAILED
16+
# @inventory SMART drive model - Drive model per device
17+
# @inventory SMART drive temperatures - Current temperature in Celsius
18+
# @inventory SMART drive power-on hours - Cumulative runtime in hours
19+
# @inventory SMART NVMe available spare - Remaining spare blocks (%), NVMe only
20+
# @inventory SMART NVMe percentage used - Drive life consumed (%), NVMe only
21+
# @inventory SMART NVMe media errors - Uncorrectable media errors, NVMe only
22+
# @inventory SMART failed drives - Only present on hosts with a failing drive
23+
{
24+
vars:
25+
linux::
26+
"_smartctl" string => ifelse(
27+
fileexists("/usr/sbin/smartctl"), "/usr/sbin/smartctl",
28+
fileexists("/sbin/smartctl"), "/sbin/smartctl",
29+
"/usr/sbin/smartctl" # default fallback
30+
);
31+
"_sdir" string => "$(sys.statedir)";
32+
"_cache_ttl" string => "3600"; # 1 hour
33+
34+
# Enumerate drives - extract first field from each line of smartctl --scan
35+
"_scan_lines"
36+
slist => splitstring(
37+
execresult("$(_smartctl) --scan 2>/dev/null", "useshell"),
38+
"\n", 32);
39+
40+
"_drives"
41+
slist => maplist(regex_replace("$(this)", "^(\S+).*", "\1", ""), "_scan_lines");
42+
43+
"_id[${_drives}]" string => canonify("${_drives}");
44+
"_cache[${_drives}]" string => "$(_sdir)/inventory_smartctl_${_id[${_drives}]}.json";
45+
46+
classes:
47+
linux::
48+
"_have_smartctl" expression => isexecutable("$(_smartctl)");
49+
50+
# Cache file is missing - needs refresh
51+
"_cache_missing_${_id[${_drives}]}"
52+
not => fileexists("${_cache[${_drives}]}");
53+
54+
# Cache file is stale - needs refresh
55+
"_cache_stale_${_id[${_drives}]}"
56+
expression => isgreaterthan(
57+
eval("$(sys.systime) - $(filestat(${_cache[${_drives}]}, mtime))"),
58+
"$(_cache_ttl)"),
59+
if => fileexists("${_cache[${_drives}]}");
60+
61+
# Refresh if missing or stale
62+
"_refresh_${_id[${_drives}]}"
63+
or => {
64+
"_cache_missing_${_id[${_drives}]}",
65+
"_cache_stale_${_id[${_drives}]}"
66+
};
67+
68+
files:
69+
linux._have_smartctl::
70+
"${_cache[${_drives}]}"
71+
content => execresult("$(_smartctl) -j -a ${_drives}", "noshell", "stdout"),
72+
if => "_refresh_${_id[${_drives}]}";
73+
74+
methods:
75+
linux._have_smartctl::
76+
# Call parsing bundle for each drive (only when cache exists)
77+
"parse_${_id[${_drives}]}"
78+
usebundle => parse("${_drives}", "${_cache[${_drives}]}"),
79+
useresult => "_d_${_id[${_drives}]}",
80+
if => fileexists("${_cache[${_drives}]}");
81+
82+
vars:
83+
linux._have_smartctl::
84+
# Collect results from sub-bundles into formatted entries
85+
"_health_entries[${_drives}]"
86+
string => "${_drives}: ${_d_${_id[${_drives}]}[health]}",
87+
if => isvariable("_d_${_id[${_drives}]}[health]");
88+
89+
"_model_entries[${_drives}]"
90+
string => "${_drives}: ${_d_${_id[${_drives}]}[model]}",
91+
if => isvariable("_d_${_id[${_drives}]}[model]");
92+
93+
"_temp_entries[${_drives}]"
94+
string => "${_drives}: ${_d_${_id[${_drives}]}[temp]} C",
95+
if => isvariable("_d_${_id[${_drives}]}[temp]");
96+
97+
"_hours_entries[${_drives}]"
98+
string => "${_drives}: ${_d_${_id[${_drives}]}[hours]} h",
99+
if => isvariable("_d_${_id[${_drives}]}[hours]");
100+
101+
"_nvme_spare_entries[${_drives}]"
102+
string => "${_drives}: ${_d_${_id[${_drives}]}[nvme_spare]}%",
103+
if => isvariable("_d_${_id[${_drives}]}[nvme_spare]");
104+
105+
"_nvme_pct_used_entries[${_drives}]"
106+
string => "${_drives}: ${_d_${_id[${_drives}]}[nvme_pct_used]}%",
107+
if => isvariable("_d_${_id[${_drives}]}[nvme_pct_used]");
108+
109+
"_nvme_media_errors_entries[${_drives}]"
110+
string => "${_drives}: ${_d_${_id[${_drives}]}[nvme_media_errors]}",
111+
if => isvariable("_d_${_id[${_drives}]}[nvme_media_errors]");
112+
113+
"_failed_entries[${_drives}]"
114+
string => "${_drives}",
115+
if => strcmp("${_d_${_id[${_drives}]}[health]}", "FAILED");
116+
117+
# Inventory attributes (visible in Mission Portal)
118+
"drive_health"
119+
slist => getvalues(_health_entries),
120+
meta => { "inventory", "attribute_name=SMART drive health" };
121+
122+
"drive_model"
123+
slist => getvalues(_model_entries),
124+
meta => { "inventory", "attribute_name=SMART drive model" };
125+
126+
"drive_temperatures"
127+
slist => getvalues(_temp_entries),
128+
meta => { "inventory", "attribute_name=SMART drive temperatures (C)" };
129+
130+
"drive_power_on_hours"
131+
slist => getvalues(_hours_entries),
132+
meta => { "inventory", "attribute_name=SMART drive power-on hours" };
133+
134+
"nvme_available_spare"
135+
slist => getvalues(_nvme_spare_entries),
136+
meta => { "inventory", "attribute_name=SMART NVMe available spare" };
137+
138+
"nvme_percentage_used"
139+
slist => getvalues(_nvme_pct_used_entries),
140+
meta => { "inventory", "attribute_name=SMART NVMe percentage used" };
141+
142+
"nvme_media_errors"
143+
slist => getvalues(_nvme_media_errors_entries),
144+
meta => { "inventory", "attribute_name=SMART NVMe media errors" };
145+
146+
"failed_drives"
147+
slist => getvalues(_failed_entries),
148+
meta => { "inventory", "attribute_name=SMART failed drives" };
149+
150+
linux.!_have_smartctl::
151+
"drive_health"
152+
string => "SMARTCTL_MISSING",
153+
meta => { "inventory", "attribute_name=SMART drive health" };
154+
155+
reports:
156+
linux._have_smartctl.verbose_mode::
157+
"inventory_smartctl: monitoring ${_drives}";
158+
"inventory_smartctl: ${_drives} health=${_d_${_id[${_drives}]}[health]}"
159+
if => isvariable("_d_${_id[${_drives}]}[health]");
160+
161+
!linux.verbose_mode::
162+
"$(this.promise_filename): inventory_smartctl is Linux-only.";
163+
}
164+
165+
bundle agent parse(drive, cache_file)
166+
# @brief Parse smartctl JSON and return key metrics via bundle_return_value_index
167+
{
168+
vars:
169+
"_json" data => readjson("$(cache_file)");
170+
171+
# Extract metrics directly from JSON
172+
"_health"
173+
string => ifelse(strcmp("${_json[smart_status][passed]}", "true"), "PASSED", "FAILED"),
174+
if => isvariable("_json[smart_status][passed]");
175+
176+
"_model"
177+
string => "${_json[model_name]}",
178+
if => isvariable("_json[model_name]");
179+
180+
"_temp"
181+
string => "${_json[temperature][current]}",
182+
if => isvariable("_json[temperature][current]");
183+
184+
"_hours"
185+
string => "${_json[power_on_time][hours]}",
186+
if => isvariable("_json[power_on_time][hours]");
187+
188+
"_nvme_spare"
189+
string => "${_json[nvme_smart_health_information_log][available_spare]}",
190+
if => isvariable("_json[nvme_smart_health_information_log][available_spare]");
191+
192+
"_nvme_pct_used"
193+
string => "${_json[nvme_smart_health_information_log][percentage_used]}",
194+
if => isvariable("_json[nvme_smart_health_information_log][percentage_used]");
195+
196+
"_nvme_media_errors"
197+
string => "${_json[nvme_smart_health_information_log][media_errors]}",
198+
if => isvariable("_json[nvme_smart_health_information_log][media_errors]");
199+
200+
reports:
201+
"$(_health)" bundle_return_value_index => "health";
202+
"$(_model)" bundle_return_value_index => "model";
203+
"$(_temp)" bundle_return_value_index => "temp";
204+
"$(_hours)" bundle_return_value_index => "hours";
205+
"$(_nvme_spare)" bundle_return_value_index => "nvme_spare";
206+
"$(_nvme_pct_used)" bundle_return_value_index => "nvme_pct_used";
207+
"$(_nvme_media_errors)" bundle_return_value_index => "nvme_media_errors";
208+
}
209+
210+
body file control { namespace => "default"; }
211+
212+
bundle agent __main__
213+
{
214+
methods:
215+
"inventory_smartctl:main";
216+
}

0 commit comments

Comments
 (0)