What you need to know to manage Linux disks

Managing disks and the file systems that reside on them is something of an art – from initial setup to monitoring performance

Credit: Les Chatfield

How much do you need to know about disks to successfully manage a Linux system? What commands do what? How do you make good decisions about partitioning? What kind of troubleshooting tools are available? What kind of problems might you run into? This article covers a lot of territory – from looking into the basics of a Linux file systems to sampling some very useful commands.

Disk technology background

In the beginning days of Unix and later Linux, disks were physically large, but very small in terms of storage capacity. A 300 megabyte disk in the mid-90’s was the size of a shoebox. Today, you can get multi-terrabyte disks that are the size of a slice of toast.

Traditionally, files resided within file systems that resided in disk partitions that were themselves simply slices of disks. This organization still dominates today, though servers in large data centers often take on an entirely different structure.

                         /  \
                        /    \
                       / file \
                      /        \
                    /            \
                   / file system  \
                  /                \
                /   disk partition   \
              /          disk          \

This simplistic view still works for many systems, but these days there are lot of complexities that make disk management harder in some ways and easier in others. A file system might be virtual – no longer residing on a single disk and more complex to manage, but far easier to resize as needed. In fact, the entire system could be virtual. And what we might manage as if it were a single disk could actually be some portion of a very large disk array.

Disk management tasks

Sysadmins generally have to deal with many issues when it comes to managing disks. These include:

  • Partitioning disks
  • Creating file systems
  • Mounting file systems
  • Sharing file systems
  • Monitoring free space within file systems
  • Backing up (and sometimes restoring) file systems

The reasons to partition a disk include:

  • protecting some file systems from running out of space (e.g., you may want the OS partition to be separated from home directories or applications to keep it from being affected if users’ files begin to take up far an excessive amount of disk space)
  • improving performance
  • allocating swap space
  • facilitating maintenance and backups (e.g., you might be able to unmount /apps if it’s not part of / and you might want to back up /home more frequently than /usr)
  • more efficient (and targeted) fsck
  • maintaining (particularly on test systems) multiple operating systems
  • reserving enough disk space for file system expansion
  • sharing select file systems with other systems

Partitioning commands

For most Linux servers, partitioning is done before the servers are deployed. On the other hand, you might add disks at some later time or hold back some significant amount of free disk space for future use.

To make changes or verify partitions, enter a command such as fdisk /dev/sda to start fdisk interactively and then type m to see a list of the things that you can do with the fdisk command.

$ sudo fdisk /dev/sda

Command (m for help): m
Command action
   a   toggle a bootable flag
   b   edit bsd disklabel
   c   toggle the dos compatibility flag
   d   delete a partition
   l   list known partition types
   m   print this menu
   n   add a new partition
   o   create a new empty DOS partition table
   p   print the partition table
   q   quit without saving changes
   s   create a new empty Sun disklabel
   t   change a partition's system id
   u   change display/entry units
   v   verify the partition table
   w   write table to disk and exit
   x   extra functionality (experts only)

As you can see, the fdisk command provides a lot of functionality. The partitions that you set up may look something like this configuration in which four partitions have been set up on a single disk – /dev/sda.

|   /  40G   |     /home  80G         |    /apps  70G      | swap |
 sda1         sda2                     sda3                 sda4

Examining disk space and disk partitions

There are a number of excellent commands for examining disk partitions. The df command is one of the most commonly used commands for reporting on disk space usage. With the -h option, the df command displays the measurements in the most "human-friendly" format and that is, in fact, what the “h” is meant to imply. As you can see in the example below, the measurements are displayed in kilobytes, megabytes or gigabytes depending on the sizes rather than all using the same scale.

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            969M  4.0K  969M   1% /dev
tmpfs           196M  1.1M  195M   1% /run
/dev/sda1        37G  4.5G   31G  13% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            5.0M     0  5.0M   0% /run/lock
none            980M  152K  979M   1% /run/shm
none            100M   36K  100M   1% /run/user
/dev/sda3        28G   44M   26G   1% /apps

The pydf command (think "python df" as it's really a python script) also provides a very useful disk usage display showing mount points and cute little illustrations for how full each partition is.

$ pydf
Filesystem Size  Used Avail Use%                 Mounted on
/dev/sda1   37G 4534M   30G 12.1 [##...........] /
/dev/sda3   27G   44M   26G  0.2 [.............] /apps

The parted command displays partition information in a different format:

$ sudo parted -l
Model: ATA WDC WD800AAJS-60 (scsi)
Disk /dev/sda: 80.0GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start   End     Size    Type     File system     Flags
 1      1049kB  40.0GB  40.0GB  primary  ext4            boot
 2      40.0GB  50.0GB  10.0GB  primary  linux-swap(v1)
 3      50.0GB  80.0GB  30.0GB  primary  ext4

The lsblk (list block devices) command illustrates the relationship between disks and their partitions graphically and also supplies the major and minor device numbers and mount points.

$ lsblk
sda      8:0    0  74.5G  0 disk
├─sda1   8:1    0  37.3G  0 part /
├─sda2   8:2    0   9.3G  0 part [SWAP]
└─sda3   8:3    0    28G  0 part /apps

The fdisk command reports more details on disk partitions and uses very different numbers. You can also use fdisk to create or delete partitions, list unpartitioned space, change a partition type, or verify the partition table.

$ sudo fdisk -l

Disk /dev/sda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders, total 156301488 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000f114b

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *        2048    78125055    39061504   83  Linux
/dev/sda2        78125056    97656831     9765888   82  Linux swap / Solaris
/dev/sda3        97656832   156301311    29322240   83  Linux

The sfdisk command is similar to fdisk, but makes some partition manipulation activities easier to perform.

$ sudo sfdisk -l -uM

Disk /dev/sda: 9729 cylinders, 255 heads, 63 sectors/track
Units = mebibytes of 1048576 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start   End    MiB    #blocks   Id  System
/dev/sda1   *     1  38146  38146   39061504   83  Linux
/dev/sda2     38147  47683   9537    9765888   82  Linux swap / Solaris
/dev/sda3     47684  76318  28635   29322240   83  Linux
/dev/sda4         0      -      0          0    0  Empty

NOTE: A mebibyte (MiB) = 220 bytes or 1,048,576 bytes.

The cfdisk command can also be used to display or manipulate disk partitions.

$ sudo cfdisk

                           cfdisk (util-linux 2.20.1)

                              Disk Drive: /dev/sda
                        Size: 80026361856 bytes, 80.0 GB
              Heads: 255   Sectors per Track: 63   Cylinders: 9729

    Name      Flags      Part Type  FS Type          [Label]      Size (MB)
                          Pri/Log   Free Space                         1.05*
    sda1      Boot        Primary   ext4                           39998.99*
    sda2                  Primary   swap                           10000.27*
    sda3                  Primary   ext4                           30025.98*
                          Pri/Log   Free Space                         0.10*

     [   Help   ]  [   New    ]  [  Print   ]  [   Quit   ]  [  Units   ]
     [  Write   ]

                      Create new partition from free space

Monitoring disk performance

The iostat command can display statistics that illustrate how disks are performing, including how heavily they are being used. It also displays important measurements that show how busy the CPU is and how much of its resources are used for types of work. The system described below is idle more then 95% of the time. More importantly for our focus on disks, the %iowait (CPU waiting on disk IO) is very low. This would not be true if the disk were unusually busy and disk IO were a bottleneck.

$ iostat -x 60
Linux 3.13.0-129-generic (stinkbug)     08/31/2017      _x86_64_        (2 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.93    1.15    0.35    1.86    0.00   95.73

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               8.37     3.26   13.41    2.79   341.14   191.82    65.79     0.61   37.60   30.40   72.14   2.52   4.08

Probably one of the most informative commands for looking at disk health is smartctl (part of smartmontools). While the command generates a lot of output, it provides valuable measurements that might help you pinpoint disk problems, particularly once you get used to working with its extensive output.

$ sudo smartctl -a /dev/sda1
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.13.0-129-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

Model Family:     Western Digital Caviar Blue Serial ATA
Device Model:     WDC WD800AAJS-60M0A0
Serial Number:    WD-WMAV37134378
LU WWN Device Id: 5 0014ee 0015c85ef
Firmware Version: 02.03E02
User Capacity:    80,026,361,856 bytes [80.0 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.5, 3.0 Gb/s
Local Time is:    Thu Aug 31 15:30:19 2017 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                ( 2700) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  36) minutes.
SCT capabilities:              (0x303f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0x002f 200  200   051    Pre-fail Always      -       0
  3 Spin_Up_Time            0x0027 143  140   021    Pre-fail Always      -       3841
  4 Start_Stop_Count        0x0032 100  100   000    Old_age  Always      -       178
  5 Reallocated_Sector_Ct   0x0033 200  200   140    Pre-fail Always      -       0
  7 Seek_Error_Rate         0x002f 100  253   051    Pre-fail Always      -       0
  9 Power_On_Hours          0x0032 058  058   000    Old_age  Always      -       31203
 10 Spin_Retry_Count        0x0033 100  100   051    Pre-fail Always      -       0
 11 Calibration_Retry_Count 0x0032 100  100   000    Old_age  Always      -       0
 12 Power_Cycle_Count       0x0032 100  100   000    Old_age  Always      -       175
184 End-to-End_Error        0x0033 100  100   097    Pre-fail Always      -       0
187 Reported_Uncorrect      0x0032 100  100   000    Old_age  Always      -       0
188 Command_Timeout         0x0032 100  100   000    Old_age  Always      -       0
190 Airflow_Temperature_Cel 0x0022 066  062   040    Old_age  Always      -       34
192 Power-Off_Retract_Count 0x0032 200  200   000    Old_age  Always      -       103
193 Load_Cycle_Count        0x0032 200  200   000    Old_age  Always      -       178
196 Reallocated_Event_Count 0x0032 200  200   000    Old_age  Always      -       0
197 Current_Pending_Sector  0x0032 200  200   000    Old_age  Always      -       0
198 Offline_Uncorrectable   0x0030 200  200   000    Old_age  Offline     -       0
199 UDMA_CRC_Error_Count    0x0032 200  200   000    Old_age  Always      -       0
200 Multi_Zone_Error_Rate   0x0008 200  200   000    Old_age  Offline     -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description  Status                 Remaining LifeTime(hours) LBA_of_first_error
# 1  Short offline     Completed without error      00%    30349        -
# 2  Extended offline  Aborted by host              80%        0        -

SMART Selective self-test log data structure revision number 1
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

There are numerous other commands for examining disks and file systems. Those described here are some of the most useful and informative. Using them periodically has advantages as the easiest way to spot problems is becoming so used to the output of commands such as these that you easily spot the kind of differences that might indicate problems.

Join the newsletter!

Error: Please check your email address.

More about ----ACSCountLinuxSeekSPANSunTestWDWestern Digital

Show Comments

Market Place