7.3.3 文件系统组件-EW帮帮网

一、虚拟文件系统VFS（抽象层）

1、通用文件模型

通常一个完整的Linux系统由数千到数百万个文件组成，文件中存储了程序、数据和各种信息。层次化的目录结构用于对文件进行编排和分组。

文件系统类型一般可以分为 3 种：

基于磁盘的文件系统（ Disk-based Filesystem）
虚拟文件系统（ Virtual Filesystem）
网络文件系统（ Network Filesystem）

a、inode

Linux 内核文件系统最重要的数据结构为inode，则一个inode 就代表一个文件，inode 结构体保存文件的大小，文件的块大小、创建时间等各种参数。一个文件的inode只有唯一一个。

Linux 内核 inode 结构体如下：

b、链接

在 linux 系统中有种文件是链接文件，可以为解决文件的共享使用。链接分为两种：一种是硬链接(Hard Link)，另一种是软链接或者也称为符号链接(Symbolic Link)。

【硬连接】相当于给一个文件取了多个名称，多个文件名称对应同一个索引节点，索引节点的成员i_nlink 是硬链接计数。

【软链接】这种文件的数据是另一个文件的路径，软链接可对文件或目录创建。

2、VFS 结构

在 VFS 接口实现当中，涉及大量的数据结构。VFS 结构由两个部分组成：文件和文件系统，这些都需要管理和抽象。

a、文件表示

inode 是 Linux 内核选择用于表示文件内容和相关元数据的方法。在抽象对底层文件系统的访问时，并未使用固定的函数，而是使用了函数指针。这函数指针保存在两个结构中，包括了所有相关的函数。

(1) inode 操作：创建链接、文件重命名、在目录中生成新文件、删除文件。

(2) 文件操作：作用于文件的数据内容。它们包含一些显然的操作(如读和写)，还包括如设置文件位置指针和创建内存映射之类的操作。

b、文件系统和超级块

VFS 支持的文件系统类型通过一种特殊的内核对象连接进来，该对象提供了一种读取超级块的方法。除了文件系统的关键信息（块长度、最大文件长度，等等）之外，超级块还包含了读、写、操作 inode 的函数指针。

内核还建立了一个链表，包含所有活动文件系统的超级块实例。之所以使用活动（ active）这个术语替代已装载（mounted），是因为在某些环境中，有可能使用一个超级块对应几个装载点。

c、VFS 的 inode 结构

inode分为内存中的inode和文件系统中的inode。VFSinode属于前者，后者以Ext2/Ext3为代表的inode。

// 常用
/*
 * Keep mostly read-only and often accessed (especially for
 * the RCU path lookup and 'stat' data) fields at the beginning
 * of the 'struct inode'
 */
struct inode {
    umode_t         i_mode; // inode的权限
    unsigned short      i_opflags;
    kuid_t          i_uid;  // inode拥有着的id
    kgid_t          i_gid;  // inode所属的群组id
    unsigned int        i_flags;

#ifdef CONFIG_FS_POSIX_ACL
    struct posix_acl    *i_acl;
    struct posix_acl    *i_default_acl;
#endif
    ......
dev_t          i_rdev;  // 如果inode代表的是device的话，哪此字段将记录device的代码
loff_t          i_size;  // inode所代表的档案大小
struct timespec64   i_atime; // inode最近一次的存取时间
struct timespec64   i_mtime; // inode最近一次的修改时间
struct timespec64   i_ctime; // inode产生的时间
spinlock_t      i_lock; /* i_blocks, i_bytes, maybe i_size */
unsigned short          i_bytes;
u8          i_blkbits;   // inode在做io操作时区块大小
u8          i_write_hint;
blkcnt_t        i_blocks;// i弄得所使用block数，一个block为512byte

d、inode 操作

内核提供了大量函数，对 inode 进行操作。为此定义了一个函数指针的集合，以抽象这些操作，因为实际数据是通过具体文件系统的实现操作的。调用接口总是保持不变，但实际工作是由特定于实现的函数完成的。

e、目录项缓存

由于块设备速度较慢，可能需要很长时间才能找到与一个文件名关联的 inode。即使设备数据已经在页缓存。Linux使用目录项缓存（简称 dentry 缓存）来快速访问此前的查找操作的结果。该缓存围绕着 struct dentry 建立，此前已经提到几次这个结构。

在VFS 连同文件系统实现读取的一个目录项（目录或文件）的数据之后，则创建一个 dentry 实例，以缓存找到的数据。

A1、dentry 结构，该结构定义如下：

struct dentry {
    /* RCU lookup touched fields */
    unsigned int d_flags;       /* protected by d_lock目录项标记 */
    seqcount_t d_seq;       /* per dentry seqlock */
    struct hlist_bl_node d_hash;    /* lookup hash list 散列表表项的指针*/
    struct dentry *d_parent;    /* parent directory 父目录的目录项对象*/
    struct qstr d_name;         // 目录项的名称
    struct inode *d_inode;      /* Where the name belongs to - NULL is negative 与文件名称相关联的索引节点*/
    unsigned char d_iname[DNAME_INLINE_LEN];    /* small names */

    /* Ref lookup also touches following */
    struct lockref d_lockref;   /* per-dentry lock and refcount */
    const struct dentry_operations *d_op;
    struct super_block *d_sb;   /* The root of the dentry tree 文件的超级块对象*/
    unsigned long d_time;       /* used by d_revalidate */
    void *d_fsdata;         /* fs-specific data */

    union {
        struct list_head d_lru;     /* LRU list */
        wait_queue_head_t *d_wait;  /* in-lookup ones only */
    };
    struct list_head d_child;   /* child of parent list 子目录中目录项对象的链表的指针*/
    struct list_head d_subdirs; /* our children 对目录而言，表示子目录目录项对象的链表*/
    /*
     * d_alias and d_rcu can share memory
     */
    union {
        struct hlist_node d_alias;  /* inode alias list 相关索引节点（别名）的链表*/
        struct hlist_bl_node d_in_lookup_hash;  /* only for in-lookup ones */
        struct rcu_head d_rcu;
    } d_u;
} __randomize_layout;

A2、缓存的组织

dentry 结构不仅使得易于处理文件系统，对提高系统性能也很关键。dentry 对象在内存中的组织，涉及下面两个部分。

(1) 一个散列表（ dentry_hashtable）包含了所有的dentry对象。

(2) 一个 LRU（最近最少使用，least recently used）链表，其中不再使用的对象将授予一个最后宽限期，宽限期过后才从内存移除。

A3、dentry 操作

dentry_operations 结构保存了一些指向各种特定于文件系统可以对 dentry 对象执行的操作的函数指针。该结构定义如下：

struct dentry_operations {
    int (*d_revalidate)(struct dentry *, unsigned int);
    int (*d_weak_revalidate)(struct dentry *, unsigned int);
    int (*d_hash)(const struct dentry *, struct qstr *);
    int (*d_compare)(const struct dentry *,
            unsigned int, const char *, const struct qstr *);
    int (*d_delete)(const struct dentry *);
    int (*d_init)(struct dentry *);
    void (*d_release)(struct dentry *);
    void (*d_prune)(struct dentry *);
    void (*d_iput)(struct dentry *, struct inode *);
    char *(*d_dname)(struct dentry *, char *, int);
    struct vfsmount *(*d_automount)(struct path *);
    int (*d_manage)(const struct path *, bool);
    struct dentry *(*d_real)(struct dentry *, const struct inode *);
} ____cacheline_aligned;

二、Proc 文件系统

1、proc 文件系统

proc 文件系统是一种虚拟文件系统，其信息不能从块设备读取。只有在读取文件内容时才动态生成相应的信息。使用proc文件系统，可以获得有关内核各子系统的信息（如内存利用率、附接的外设等等），也可以在不重新编译内核源代码的情况下修改内核的行为，或重启系统。

proc 文件系统提供一种接口，可用于该机制导出的所有选项，直接地修改参数无需开发专门程序，只需要一个shell和标准的 cat、 echo 程序即可。

/proc 信息

尽管 proc 文件系统的容量依系统而不同，其中仍然包含了许多深层嵌套的目录、文件、链接。信息可以分为以下几大类：

内存管理；系统进程的特征数据；

文件系统；设备驱动程序；

系统总线；电源管理；

终端；系统控制参数。

Linux 系统上的/proc 目录是一种文件系统，即proc文件系统。/proc 是一种伪文件系统（也即虚拟文件系统），具体目录如下：

2、proc 常见文件

buddyinfo：存储记录系统的内存资源，通过它可以进行参考内存碎片情况分析。
cmdline：在启动时传递至内核的相关参数信息，这些信息通常由 lilo 或 grub 等启动管理工具进行传递；
cpuinfo：处理器的相关信息的文件；
crypto：系统上已安装的内核使用的密码算法及每个算法的详细信息列表；
devices：系统已经加载的所有块设备和字符设备的信息；
diskstats：每块磁盘设备的磁盘I/O 统计信息列表；
filesystems：当前被内核支持的文件系统类型列表文件，被标示为 nodev 的文件系统表示不需要块设备的支持；
interrupts：X86 或 X86_64 体系架构系统上每个IRQ相关的中断号列表；
iomem：每个物理设备上的记忆体（RAM 或者ROM）在系统内存中的映射信息；
ioports：当前正在使用且已经注册过的与物理设备进行通讯的输入-输出端口范围信息列表；
kallsyms：模块管理工具用来动态链接或绑定可装载模块的符号定义，由内核输出；
locks：保存当前由内核锁定的文件的相关信息，包含内核内部的调试数据；每个锁定占据一行，且具有一个惟一的编号；
meminfo：系统中关于当前内存的利用状况等的信息，常由free 命令使用；
mounts：在内核 2.4.29 版本以前，此文件的内容为系统当前挂载的所有文件系统；
modules：当前装入内核的所有模块名称列表，可以由lsmod命令使用，也可以直接查看；
partitions：块设备每个分区的主设备号（major）和次设备号（minor）等信息；
stat：实时追踪自系统上次启动以来的多种统计信息；
swaps：当前系统上的交换分区及其空间利用信息；
uptime：系统上次启动以来的运行时间；
version：当前系统运行的内核版本号；
vmstat：当前系统虚拟内存的多种统计数据；
zoneinfo：内存区域（zone）的详细信息列表；

3、proc 数据结构

proc 核心数据结构源码

实现 proc 文件系统的代码紧围绕这些结构而建立的，proc大量使用 VFS 的数据结构，因为作为一种文件系统，它必须集成到内核的 VFS 抽象层中。

还有一些特定于 proc 的数据结构，用于组织内核提供的数据信息。还必须提供一个到内核各个子系统的接口，使得内核能从其数据结构中提取信息，然后借助/proc 提供给用户空间。proc文件系统中的每个数据项都由 proc_dir_entry 的一个实例描述，具体源码如下：

// fs/proc/internal.h
struct proc_dir_entry {
    /*
     * number of callers into module in progress;
     * negative -> it's going away RSN
     */
    atomic_t in_use;
    refcount_t refcnt;
    struct list_head pde_openers;   /* who did ->open, but not ->release */
    /* protects ->pde_openers and all struct pde_opener instances */
    spinlock_t pde_unload_lock;
    struct completion *pde_unload_completion;
    const struct inode_operations *proc_iops;
    union {
        const struct proc_ops *proc_ops;
        const struct file_operations *proc_dir_ops;
    };
    const struct dentry_operations *proc_dops;
    union {
        const struct seq_operations *seq_ops;
        int (*single_show)(struct seq_file *, void *);
    };
    proc_write_t write;
    void *data;
    unsigned int state_size;
    unsigned int low_ino;
    nlink_t nlink;
    kuid_t uid;
    kgid_t gid;
    loff_t size;
    struct proc_dir_entry *parent;
    struct rb_root subdir;
    struct rb_node subdir_node;
    char *name;
    umode_t mode;
    u8 namelen;
    char inline_name[];
} __randomize_layout;

装载 proc 文件系统

内核内部用于描述 proc 文件系统结构和内容的数据已初始化之后，下一步是将该文件系统装载到目录树中。在内核添加新文件系统时，会扫描一个链表，查找与该文件系统相关的 file_system_type 实例。源码如下：

static struct file_system_type proc_fs_type = {
    .name           = "proc",
    .init_fs_context    = proc_init_fs_context,
    .parameters     = proc_fs_parameters,
    .kill_sb        = proc_kill_sb,
    .fs_flags       = FS_USERNS_MOUNT | FS_DISALLOW_NOTIFY_PERM,
};

proc_sops 中对超级块的各个操作，其中收集内核管理proc文件系统所需的各个函数，具体源码如下：

const struct super_operations proc_sops = {
    .alloc_inode    = proc_alloc_inode,
    .free_inode = proc_free_inode,
    .drop_inode = generic_delete_inode,
    .evict_inode    = proc_evict_inode,
    .statfs     = simple_statfs,
    .show_options   = proc_show_options,
};

静态的 proc_dir_entry 实例：

/*
 * This is the root "inode" in the /proc tree..
 */
struct proc_dir_entry proc_root = {
    .low_ino    = PROC_ROOT_INO, 
    .namelen    = 5, 
    .mode       = S_IFDIR | S_IRUGO | S_IXUGO, 
    .nlink      = 2, 
    .refcnt     = REFCOUNT_INIT(1),
    .proc_iops  = &proc_root_inode_operations, 
    .proc_dir_ops   = &proc_root_operations,
    .parent     = &proc_root,
    .subdir     = RB_ROOT,
    .name       = "/proc",
};

三、super_block 基础

super_block：不同文件系统类型的物理结构有所不同，但通过虚拟文件系统将它们定义成一套统一的数据结构。一个文件系统对应一个超级块，其结构体为super_block，专门用来描述文件系统的相关信息，挂载文件系统的时候在内存中创建超级块的副本。

具体 Linux 内核中，super_block 数据结构源码如下：

如何获取指定块设备的超级块编号，直接使用系统调用get_super()函数处理。

四、挂载文件系统

A、挂载文件系统基础

A1、挂载描述符

Linux 操作系统的一个文件系统，只有挂载到内存中目录树的一个目录下，进程才能够访问这个文件系统。每次挂载文件系统，虚拟文件系统就会创建一个挂载描述符（mount 结构体）。挂载描述符用来描述文件系统的一个挂载实例，同一个存储设备上的文件系统可以多次挂载，每次挂载到不同的目录下。具体挂载描述符的内核源码如下：

struct mount {
    struct hlist_node mnt_hash;    // 用于将挂载点加入哈希表以便快速查找
    struct mount *mnt_parent;      // 指向父级挂载点的指针
    struct dentry *mnt_mountpoint; // 指向挂载点对应的目录项的指针
    struct vfsmount mnt;
    union {
        struct rcu_head mnt_rcu;
        struct llist_node mnt_llist;
    };
#ifdef CONFIG_SMP
    struct mnt_pcp __percpu *mnt_pcp;
#else
    int mnt_count;   //引用计数器，记录挂载点被引用的次数
    int mnt_writers;
#endif
    struct list_head mnt_mounts;    /* list of children, anchored here */
    struct list_head mnt_child; /* and going through their mnt_child */
    struct list_head mnt_instance;  /* mount instance on sb->s_mounts */
    const char *mnt_devname;    /* Name of device e.g. /dev/dsk/hda1 */
    struct list_head mnt_list;
    struct list_head mnt_expire;    /* link in fs-specific expiry list */
    struct list_head mnt_share; /* circular list of shared mounts */
    struct list_head mnt_slave_list;/* list of slave mounts */
    struct list_head mnt_slave; /* slave list entry */
    struct mount *mnt_master;   /* slave is on master->mnt_slave_list 对于从属挂载点加入哈希表，此成员指向主挂载点*/
    struct mnt_namespace *mnt_ns;   /* containing namespace */
    struct mountpoint *mnt_mp;  /* where is it mounted */
    union {
        struct hlist_node mnt_mp_list;  /* list mounts with the same mountpoint */
        struct hlist_node mnt_umount;  
    };
    struct list_head mnt_umounting; /* list entry for umount propagation */
#ifdef CONFIG_FSNOTIFY
    struct fsnotify_mark_connector __rcu *mnt_fsnotify_marks;
    __u32 mnt_fsnotify_mask;
#endif
    int mnt_id;         /* mount identifier */
    int mnt_group_id;       /* peer group identifier */
    int mnt_expiry_mark;        /* true if marked for expiry */
    struct hlist_head mnt_pins;
    struct hlist_head mnt_stuck_children;
} __randomize_layout;

A2、文件系统类型

因每种文件系统的超级块的格式不同，所每种文件系统需要向虚拟文件系统注册文件系统类型file_system_type，并且实现 mount 方法用来读取和解析超级块。具体内核源码如下：

管理员权限可以执行命令：cat /proc/filesystems来查看已注册的文件系统类型。

A3、挂载文件系统

虚拟文件系统在内存中把目录组织为一棵树，一个文件系统，有挂载到内存中目录树的一个目录，进程才能访问这个文件系统。

管理员权限可以执行命令：mount -t fstype [-0 options]device dir。把存储设备 device 上类型为fstype 的文件系统挂载到目录 dir 下。

glibc 库封装挂载文件系统的函数mount，两个卸载文件系统的函数 umount/umount2。

B、系统调用 mount

系统调用 mount 用来挂载文件系统，其定义格式如下：

C、绑定挂载

绑定挂载（bind mount）用来把目录树的一棵子树挂载到其它地方。执行绑定挂载命令：mount --bind olddir newdir。

把目录 olddir 为根的子树挂载到目录newold，以后从目录newdir 和目录 olddir 可以看到相同的内容。

参考连接：https://github.com/0voice

7.3.3 文件系统组件

一、虚拟文件系统VFS（抽象层）

1、通用文件模型

2、VFS 结构

二、Proc 文件系统

1、proc 文件系统

2、proc 常见文件

3、proc 数据结构

三、super_block 基础

四、挂载文件系统

网站公告

今日签到

热门文章

最新发布