7/12/2011

File management and debugging

 After a little discussion about the cache system of the ISO9660 filesystem layer, my mentor (Chris Johns) and I decided that the cache wasn't the thing to primarily focus on for now. The last week was therefore dedicated to the implementation of basic file management and access routines.

 The first step taken (i.e: directory structure management), my work time was dedicated to the implementation of file related file-system handlers. Like the directory handlers, the core of this functionality reside in a few POSIX call handlers : open / read / close / stat. The write call is not implemented since iso9660 volumes are read-only.

 Additionally, I wrote down a few helper routines in order to let me debug the code during the next steps of the development. All the "function error break" have been documented, the functions now silently fails in production mode and becomes verbose when DEBUG is defined (when RTEMS is compiled with debug support), which will become very, very useful.. :-)

 I begun a shell function dedicated to the debugging of a mounted ISO9660 volume (just like Chris Johns did for the RFS file-system) : the debugiso9660 is able to call various subroutines aimed to perform debug operations on the file-system, it's cleaner than writing hard-debugging code within the function and it could be more efficient than tracing with GDB. Currently the only existing subroutine is "volume", which prints volume-related information in the shell console.

7/04/2011

Working directories

 I had a strange bug today which trashes the shell command prompt when changing to a sub-directory of the ISO volume. I fixed the faulty eval_path() handler which wasn't detecting well a filesystem change in the path, e.g: if the  iso9660 volume is mounted on /cdrom, an access to the path /cdrom/sub_dir/../../ results  in a fileystem change since / reside on an other file-system.


  This screenshot shows the basics of directory management in a iso9660 volume : the application is able to navigate through the folder structure. Note that the source ISO disk virtually inserted in the /dev/hdb drive has been created with mkisofs with all iso9660-compliancy options enabled : all file and directory names use a limited character set (i.e: caps only).
  The ECMA-119 standard defines three levels of implementation :

  • Level 1 : 
    • Each file consists of only one file section (a sector wide)
    • The file should be formatted as "8.3" characters (8 characters for the name and 3 for the extension)
    • A directory name should not contain more than eight characters
    • The character set is limited to "d-characters" (digits, capital letters and underscore)
  • Level 2 :
    • Each file consists of only one file section
  • Level 3 :
    • No restrictions on file size


  With the directory handlers working I can start the file handlers. Currently only the "fstat" system call is implemented for files, and I will enhance its implementation with data found in "extended attribute records". These records are optional and gives some informations about the permissions of a given file.
  Those permissions will be really ready after the integration of the Rock Ridge extension which embed ACL in an ISO9660 among other things.

7/03/2011

ISO9660 records and directory management

Each file or directory in an ISO9660 volume is described by a record. This record contains various informations related to the file or directory :

  • The length of the record
  • The location of the file/directory (sector number)
  • The size of the file/directory
  • The length of an eventual extended attribute record
  • Some flags describing the record
  • Identifier (name) length and content of the record.
In an iso9660 volume, a directory is in fact a list of these records. A file and/or directory always occupy a integer number of sectors : the remaining bytes of the record are filled with zeros. The different record from the record list of a directory cannot struggle over two sectors, if there is not enough space in a sector for a particular record, it's stored in the following sector.

Each directory contains at least two records : 
  1. the first one describes the current directory (location, length, etc.) : this is the record of the commonly known "." special file, although in a iso9660 volume the identifier of such a record is "\0" (a single zero byte).
  2. the second one describes the ".." special file and specify the location and length of the parent directory, excepted for the root directory of the volume : this record also act like a current record.
Since these record describe almost all data we need for a particular node of the filesystem (location on the volume, name, length, ...) they are used in the node management in the ISO9660 implementation.

After a first proof of concept where I was able to dump various directory information and navigate in an iso9660 volume in the shell through ls, I did a big refactoring of the record management layer and wrote dedicated functions to this purpose (see iso9660_record.c). This refactoring allows a cleaner and more efficient code. I spent hours in debugging a little mistake in the iso9660_record_create function which allocate memory and fill the various elements of the structures according to the directory record passed in parameter ; RTEMS was simply halting the CPU because of heap allocation problems. After a long track down of this bug I identified the incriminated line : a malloc(sizeof(name_len) * sizeof(char)) . The first sizeof has nothing to do there, Eclipse auto-completion killed me.

When creating a node (in the evalpath handler for example), we have to define a set of handler for a particular file type. In a ISO9660 (at least, without extension) volume, we have only two types of node : files and directory.

I preferred to deal with directory management first because I can't see the volume structure, I can't guess where the files are.

The next bunch of work will concern the file management and a enhanced cache manager : actually the cache is allocated on a per-node basis (i.e: a 2048 byte cache for each opened directory node), this is fairly inefficient and not very handy for file access management.

Basics of the filesystem layer in RTEMS

First of all, I have to thank the developers of the others filesystems supported by RTEMS (FAT, RFS, NFS, IMFS) whose source helped me a lot while developing ISO9660.

The test application

To let the operating system know that the new file-system exists, we have to fill the file-system table. The file-system table is an array of rtems_filesystem_table_t listing the supported file-systems and an entry point to initialize a new mount point. As far as I know, there is two possibility to fill this table :
  1. Using an RTEMS supported file-system and the associated configure macros (CONFIGURE_FILESYSTEM_XXX)
  2. Calling rtems_filesystem_register() (see this link)
I choose the first solution which is for me the best one since the file-system is aimed to be eventually integrated in trunk, and thus I didn't investigate the second one.
To let the magic happens I created a local version of confdefs.h included in my test application and I add ISO9660 related defines in it. That way I can develop the simple test application like a classic one except I include the hacked confdefs.h instead of the RTEMS header.

I can now write down my application, which should be able to :
  • Embed a generated ISO to test a file mount
  • Mount the filesystem from the QEMU emulated drive /dev/hdb or the previous ISO file untarred in the IMFS root
  • Launch a shell to debug the new filesystem
The ISO is embedded in a TAR file which is uncompressed in the root directory when the system boots. This part of the test application as well as the shell launch was taken from an RFS test application developed by my GSoC mentor Chris Johns.

I put all the ISO9660 file-system related code in a iso9660 subdirectory which could be easily moved for future integration (see SVN repository in the links).

File-system interface
The RTEMS virtual file-system API consists in a set of function pointers pointing to handlers for various operations on the file-system. The main handlers set is stored in a rtems_filesystem_operations_table. If some entries are not defined by a file-system implementation they can point to default handlers provided by the VFS which does nothing except returning the correct error code (ENOSYS in general to indicate that the function is not implemented).

While filling the file-system table I specified one of these handlers : the one called when mounting a volume with this file-system in order to initialize it (known as fsmount_me_h in the filesystem operation table).

In the ISO9660 implementation I put together all these handlers in iso9660_init.c .

This initialization handler receive two parameter, the mount table entry concerning the volume we're mounting and a unspecified data available to the user. The interesting one is the mount table entry, this is in this structure that we can store various data related to the filesystem by storing a pointer to whatever we want.

For ISO9660 the file-system information are stored in a iso9660_fs_info_t structure : type of mount (do the actual iso9660 volume is located on a block device or in a file ?), device descriptor, media block size, iso9660 volume informations (from the primary volume descriptor). Since this pointer will be accessible from almost all the others handlers it's important to store all data describing a particular mount in it.

The iso volume information are processed thanks to iso9660_init_volume_info() which determine which access layer to use to access the data on the volume. Since ISO9660 volumes can be initialized from either a file (e.g: a .iso file located on another filesystem) or a block device (e.g: a CD-ROM drive), I developed two access layers in the same manner  as RFS :
  • The block device layer which rely on libblock and the ATA layer to access the data. This layer has to deal with the media block size : each read operation on the device is operated at the block layer : we have to read blocks of 512 bytes.
  • The device I/O layer operate directly at byte layer through open/read/close system calls, the block management is made by the filesystem containing the file we're accessing.
The file-system interface defines handlers for others various operations. At the moment the ISO9660 implementation defines the following handlers :
  • evalpath : The hander have to parse a path and define the node associated with this path. The node can contain every information needed and is used among the various type handlers.
  • node_type : in charge of defining the standard RTEMS node type for a particular node (e.g : file, directory, device, link, ...).
  • freenode : Called when a node is destroyed to free all the structures allocated by a filesystem implementation when defining a node.
For ISO9660 each node contains the record informations associated with a given directory or file.

7/02/2011

A month later : a heads up on the project

My original blog concerning the projet was hosted on my personal webserver which had the good idea to went down when I'm 700 kilometers away. This blog is the alternative, at the cost of my previous entries.


The mid-term evaluation (July 15th) is now approaching. Even if I started the project with a big delay due to my studies it's now going well and I'm on schedule after a lot of work.

I spent the last few weeks digging into RTEMS CVS to see how file-systems are implemented, analyzing the ISO9660 standard (also known as ECMA-119), and finally starting the project.

ISO9660
My primary concerns was to understand the file-system structure and to be able to apply it to the RTEMS file-systems layer. It was not a big deal since I'm used to read standards about file structures.
Since ISO9660 is a pre-mastered file-system, it's very well organized : the directory tree is analyzed before creating the volume, which results in a clean and smooth structure, unlike read and write file-systems (EXT, NTFS, FAT32, ...) which have the bad trend to decay when inodes are added/removed.

ISO9660 volumes consists of a number of 2048 bytes wide sectors (in fact others sizes are allowed by the standard, but apparently rarely used). The first oddity of this file-system is that the first 16 sectors of the volume are unused, their content is unspecified by the standard and are considered to be filled with zero (although an application can use them as it want). It means that ISO volumes have a minimum length of 32 KiB.

Then first used sectors of the volume are dedicated to descriptors which ... describe various details of the disk. There is five different descriptors, each of them occupy exactly one sector :

  • The boot record descriptor is almost unspecified and leave a big free space in the sector for bootable disk. Although it's not specified in the original standard, an other specification (ElTorito) describe how boootable volume should be organized.
  • The primary volume descriptor contains various data about the whole volume, like his identifier, the creation date, path tables locations (see below), the root directory record (see below), the logical sector size, etc.
  • The supplementary volume descriptor contains almost the same fields as the primary volume descriptor and could be used as backup.
  • The partition descriptor can describe partitions across the volume, although it's not detailed in the standard
  • The volume set terminator descriptor doesn't contain any useful data, his solely purpose is to indicate the end of the descriptors set.
The first task the file-system has to accomplish is to look into these descriptor and determine how it should handle the volume.