Chapter 8 DIDTRIBUTED FILE SYSTEMS
Distributed File Systems
8.1 Introduction
8.2 File service architecture
8.3 Sun Network File System NFS
8.4 Andrew File System personal study
8.5 Recent advances
8.6 Summary
Learning objectives
Understand the requirements that affect the design of distributed services NFS: understand how a relatively simple, widelyused service is designed
Obtain a knowledge of file systems, both local and networked
Caching as an essential design technique Remote interfaces are not the same as APIs Security requires special consideration
Recent advances: appreciate the ongoing research that often leads to major advances
8.1 Introduction
In Distributed system there is a fundamental need to share information
Sharing stored information is the most important aspect of sharing information.
The need to share within local networks and intrantet lead to the need of services that support:
persistent storage of data and programs of all types consistent distribution of uptodate data
The purpose of this chapter is to describe the design and implementation of these basic file systems.
8.1 Introduction . . .
File system were originally developed for centralized computer systems. And later for desktop computer as a facility from the OS.
The development of network computing brought the need for file systems that could work over a network
Such a file system enables programs to store and access remote files exactly as they do local ones.
With the advent of distributed object oriented programming a need arose for the persistent storage and distribution of share objects.
The next figure shows the different types of storage system.
Introduction . . . 1 What is a file system?
Persistent stored data sets
Hierarchic name space visible to all processes API with the following characteristics:
access and update operations on persistently stored data sets
Sequential access model with additional random facilities
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
Introduction . . . 1 What is a file system
Sharing of data between users, with access control
Concurrent access: certainly for readonly access what about updates?
Other features: mountable file stores more?
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
Introduction . . .
Storage systems and their properties
In first generation of distributed systems 197495, file systems e.g. NFS were the only networked storage systems.
With the advent of distributed object systems CORBA, Java and the web, the picture has become more complex.
Figure 8.1
Storage systems and their properties
Main memory
1
File system Distributed file system
RAM
1 UNIX file system
Web
Distributed shared memory Remote objects RMIORB Persistent object store
Sun NFS Web server
Sharing Persis Distributed Consistency Example
tence
cachereplicas maintenance
2 OceanStore Ch. 10 1: strict onecopy. 3: slightly weaker guarantees. 2: considerably weaker guarantees.
Peertopeer storage system
Types of consistency:
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
Ivy DSM, Ch. 18 1 CORBA
1 CORBA Persistent Object Service
Introduction . . .
Remark on the previous figure:
The consistency column indicates whether mechanisms exist for the maintenance of consistency when multiple copies occurs during updates.
Caching is used by all systems to optimize performance of programs.
specialized mechanism need to be in place to guarantee consistency in the distributed case.
Lets now go back to the main subject of this chapter the design of basic distributed file systems.
8.1.1 Characteristics of file systems
File systems are responsible for the organization, storage, retrieval, naming ,sharing and protection of files
They provide a programming interface that characterizes the file abstraction.
Files are stored on disk or other nonvolatile media. Files contains both data and attributes
A typical attribute structure is illustrated next.
Figure 8.3
File attribute record structure
File length Creation timestamp
Read timestamp Write timestamp
Attribute timestamp Reference count
Owner File type
Access control list
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
Figure 8.3
File attribute record structure
The shadow attributes are managed by the file systems and are not normally updatable by user programs. Again
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
File length Creation timestamp
Read timestamp Write timestamp
Attribute timestamp Reference count
Owner File type
Access control list
File attribute record structure
updated by system:
File length Creation timestamp
updated by owner:
File type Access control list
Read timestamp Write timestamp
Attribute timestamp Reference count
Owner
E.g. for UNIX: rwrwr
8.1.1 Characteristics of file systems
Files systems must be able to manage a large number of files..
The different operations: creating, naming, deleting. The naming of files uses directories.
A directory is a file that provides mapping from text names to internal file identifiers.
Ex: UNIX file system
The following figure shows a typical layered module structure for the implementation of a non distributed file system in a conventional operating system
Figure 8.2
File system modules
Directory module:
relates file names to file IDs
File module:
Access control module:
relates file IDs to particular files
checks permission for operation requested
File access module: Block module:
reads or writes file data or attributes accesses and allocates disk blocks
Device module:
disk IO and buffering
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
8.1.1 Characteristics of file systems
File system operations
On the next figures you will have the list of the main file operations available to applications in UNIX systems
These are system calls implemented by the kernel
The UNIX operations are based on a programming model in which some file state information is stored by the file system for each running program.
The file system is responsible to apply the access control.
In local file systems such as UNIX, it is done when each file is opened, checked the rights of the user against the mode of access requested.
The following figure shows a typical layered module structure for the implementation of a nondistributed file system in a conventional operating system
Figure 8.4
UNIX file system operations
filedesopenname, mode filedescreatname, mode
Opens an existing file with the given name.
Creates a new file with the given name.
Both operations deliver a file descriptor referencing the open file. The mode is read, write or both.
statusclosefiledes
countreadfiledes, buffer, n countwritefiledes, buffer, n
Closes the open file filedes.
poslseekfiledes, offset, whence
Moves the readwrite pointer to offset relative or absolute, depending on whence.
statusunlinkname
Removes the file name from the directory structure. If the file has no other names, it is deleted.
statuslinkname1, name2 statusstatname, buffer
Adds a new name name2 for a file name1. Gets the file attributes for file name into buffer.
Transfers n bytes from the file referenced by filedes to buffer. Transfers n bytes to the file referenced by filedes from buffer. Both operations deliver the number of bytes actually transferred and advance the readwrite pointer.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
What is a file system? 2 Figure 8.4 UNIX file system operations
filedesopenname, mode filedescreatname, mode
Opens an existing file with the given name. Class Exercise A
statusclosefiledes
countreadfiledes, buffer, n countwritefiledes, buffer, n
file. The mode is read, write or both. Closes the open file filedes.
poslseekfiledes, offset, whence
Moves the readwrite pointer to offset relative or absolute, depending on whence.
statusunlinkname
Note: remember that read returns 0 when you attempt Removes the file name from the directory structure. If the file
statuslinkname1, name2 statusstatname, buffer
has no other names, it is deleted.
Adds a new name name2 for a file name1. Gets the file attributes for file name into buffer.
Creates a new file with the given name.
Write a simple C program to copy a file using the UNIX
Both operations deliver a file descriptor referencing the open
file system operations shown in Figure 8.4.
copyfilecharoldfile,newfile
Transfers n bytes from the file referenced by filedes to buffer.
Transfers n bytes to the file referenced by filedes from buffer. you write this part, using open, creat, read,
Both operations deliver the number of bytes actually transferred and adwvarnitceethe readwrite pointer.
to read beyond the end of the file.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
8.1.2 Distributed file systems requirements
The requirements and potential pitfalls in the design of distributed services were first discovered in the development of distributed file systems.
At the beginning these systems were offering access transparency and location transparency
The other aspects like: performance, scalability, concurrency control, fault tolerance and security requirements came later.
Here are some details on these aspects.
8.1.2 Distributed file systems requirements
Transparency: File services are the most used service in an intranet. So
We remember the transparency requirement we have described for distributed systems. Most of them apply.
Just to recall these aspects here they are:
transparency for distributed systems
Access transparency:
Enables local and remote resources to be accessed using identical operations.
Location transparency
Enables resources to be accessed without knowledge of their physical location
Concurrence transparency
Enables several processes to operate concurrently using shared resources without interference between them.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
transparency for distributed system . . .
Replication transparency:
Enables multiple instances of resources to be used to increase reliability and performance without knowledge of the replicas by users or application programmers.
Failure transparency:
Enables the concealment of faults , allowing users and application programs to complete their tasks despite the failure of hardware of software components.
Mobility transparency:
Allows the movement of resources and clients within a system without affecting the operation of users or programs.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
transparency for distributed system
performance transparency:
Allows the system to be reconfigured to improve performance as load vary.
Scaling transparency:
allows the system and applications to expand in scale without change to the system structure or the applications algorithms.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
8.1.2 Distributed file systems requirements
Transparency aspects
Access transparency
Client program should be unaware of the distribution of files. A single set of operations is provided for access to local or remote files. A program written to access local file can access remote files without modifications.
Location transparency
Client program should see a uniform file name space. Files or group of files could be relocated without the need of changing their pathname and user program sees the name space whenever they are executed
Mobility transparency
Neither client programs nor system administration tables in client nodes need to be changed when files are moved. This allow for files mobility either by system administrators or automatically.
8.1.2 Distributed file systems requirements . . .
Transparency aspects . . .
Performance transparency
Client program should be able to perform satisfactorily while the load of the file service varies within a specific range.
Scaling transparency
The service can be expanded to be able to deal with growth without major distrubances to client program
8.1.2 Distributed file systems requirements . . .
Concurrent file updates
Changes to a file by one client should not interfere with other clients doing potentially the same or other operations on the file. This is the classic concurrency problem that we will revisit.
File or record level locking is the mechanism used in practice UNIX
8.1.2 Distributed file systems requirements . . .
File replication
If the file service supports replication the file can be copied and copies may exist in different locations.
This allow the possibility of sharing load between servers It improves scalability and fault tolerance.
Few file services support full replicationCoda http:www.cs.vu.nlastbooksds110.pdf
Caching is the usual approach.
8.1.2 Distributed file systems requirements . . .
Hardware and operating system heterogeneity
The service interfaces should be defined so that client and sever software can be implemented for different OSs and computers.
Fault tolerance
The file service must continue to operate in the face of client and server failures.
Atmostonce invocation semantics can be used for transient communication failure.
The server can be stateless.
8.1.2 Distributed file systems requirements . . .
Consistency
Conventional file systems offer onecopy update semantics. This means the file content seen by all the processes accessing or updating a given file are those that they would see if only a single copy of the file contents existed.
This is the onecopy file semantics. In a distributed system due to replication or caching and time delays in propagating the information, deviation from that protocol is possible.
Security
All file systems provide access control mechanisms based on the use of access control lists. In distributed file systems there is a need for authenfication. In some cases digital signature and encryption will be used in the distributed case.
8.1.2 Distributed file systems requirements . . .
Efficiency
We would like a comparable level of performance for distributed file systems to the conventional ones.
The size of the files to be considered are growing rapidly and will generates other types of problems.
Lets review these file systems requirements.
File service requirements
Transparency Concurrency Replication Heterogeneity Fault tolerance Consistency Security Efficiency..
Tranparencies Concurrency properties
Replication properties Heterogeneity properties
Access: Same operations Fault tolerance
Consistency
Isolation Security
File service maintains multiple identical copies of Efficiency
Service can be accessed by clients running on Location: Same name space after relocation of
Service must continue to operate even when clients Unix offers onecopy update semantics for
Filelevel or recordlevel locking
fMileusst maintain access control and privacy as for
Ga l o m a o l s f o tr a d n i s y t r O i b S u t o e r d h f a i l e r d s w y a s r t e e m p l s a t i f s o u r m s u . a l l y files or processes
make errors or crash.
operations on local filescaching is completely
local files.
Other forms of concurrency control to minimise
Loadpesrhfoarminagnbcetwcoemenpasrearbvelerstomloackaelsfislersvyicsetem.
MoDbeilistyig: n mAuustobmeactiocmreplaotcibalteiownitohf tfhilesfiilse psoystseibmles of transparent.
atmostonce semantics
based on identity of user making request
more scacloanbtlention
different OSes
Performance: Satisfactory performance across a
Difficult to achieve the same for distributed fileatleastonce semantics
identities of remote users must be authenticatedLocal access has better response lower latency
specified range of system loads Service interfaces must be openprecise
systems while maintaining good performance requires idempotent operations
privacy requires secure communication
Fsapueltctifoicleartaionncseof APIs are published. Scaling: Service can be expanded to meet
and scalability.
Service must resume after a server machine Service interfaces are open to all processes not
additional loads
Full replication is difficult to implement.
crashes.
excluded by a firewall.
Caching of all or part of a file gives most of the If the service is replicated, it can continue to
vulnerable to impersonation and other benefits except fault tolerance
operate even during a server crash. attacks
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
8.1.3 Case studies
Introductory remarks
A simplified abstract model is used to separate the implementations concerns .
Two systems are considered: NFS the Sun Network File System and the Andrew File System.
File Service Architecture.
This abstract architectural model is based upon a division of responsibilities between three modules:
We will revisit this after briefly introducing two File systems.
8.1.3 Case studies
SUN NFS
SUN NFS has been introduced in 1985.
NFS provides transparent access to remote files for client programs running on UNIX and other systems.
The clientserver relationship is symmetrical.
An important aspect of NFS is it provides high level of support for hardware and OS heterogeneity.
The design is OS independent. Windows, Mac OS, Linux every version of UNIX
8.1.3 Case studies
Andrew File System
This file system was developed at CMU.
One of the aim of this system was to support information sharing on a large scale.by minimizing clientserver communication.
Whole file are transferred and cached at clients. A public version is available on linux.
It runs on many OS, Windows, Mac etc Google AFS, Linux.
8.2 File service architecture
Introduction
The file service is structured in three modules
a flat file service
a directory service a client module.
This approach is used to separate the main concerns in the design of such a system.
The next figure shows these modules and their relationships.
Figure 8.5
File service architecture
Client computer
Server computer Directory service
Application Application program program
Client module
Instructors Guide for Coulouris, Dollimore and Kindberg
Pearson Education 2005
Distributed Systems: Concepts and Design
Edn. 4
Flat file service
8.2 File service architecture
Introduction . . .
The flat file service and the directory service export each an interface for use by client programs.
The client module provide a unique programming interface with operations on files similar to those found on conventional file systems.
The division of responsibility between modules.
8.2 File service architecture
Flat file service
The flat file service is concerned with the content of the files. UFID are used to refer to files in all requests
The division of responsibilities between the flat file service and the directory service is based on the use of UFID.
When the flat file service receives a request to create a file it generates a new UFID and send it to the requester.
Directory service
The directory service provide the mapping between text name and their UFID..
This service provide the functions to create directory to add new file name from it. It is a client of the flat file service.
When a hierarchic filenaming scheme is used directories hold references to other directories
Figure 8.7
Directory service operations
LookupDir, NameFileIdthrows NotFound
Locates the text name in the directory and returns the relevant UFID. If Name is not in the directory, throws an exception.
AddNameDir, Name, FileIdthrows NameDupliecat
If Name is not in the directory, adds Name, File to the directory and updates the files attribute record.
If Name is already in the directory: throws an exception.
UnNameDir, Namethrows NotFound
If Name is in the directory: the entry containing Name is removed from the directory.
If Name is not in the directory: throws an exception.
GetNamesDir, PatternNameSeq
Returns all the text names in the directory that match the regular expression Pattern.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
8.2 File service architecture . . .
Client module
This module runs on each client computer integrating the functions of the two other modules file and directory services under a single application programming interface.
In UNIX host that module will emulates the full set of file operations.
The client module holds information about the network locations of the flat file server and directory server processes.
Caching can also be implemented for performance.
Flat file service interface
The next figure shows a definition of the interface to a flat file service.
Figure 8.6
Flat file service operations
ReadFileId, i, nDatathrows BadPosition
If 1iLengthFile: Reads a sequence of up to n items from a file starting at item i and returns it in Data.
WriteFileId, i, DatathrowsBadPosition
If 1iLengthFile1: Writes a sequence of Data to a file, starting at item i, extending the file if necessary.
CreateFileId DeleteFileId GetAttributesFileIdAttr SetAttributesFileId, Attr
Creates a new file of length 0 and delivers a UFID for it.
Removes the file from the file store.
Returns the file attributes for the file.
Sets the file attributes only those attributes that are not shaded in Figure 8.3.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
8.2 File service architecture . . .
Flat file service interface . . .
Comparison with UNIX:
The flat file services has no open and close files.
Our read and write functions specify a starting point in each file for each transfer. In UNIX this is not the case.
In UNIX there is a readwrite pointer and each read, write starts at the position of that pointer. This is not an idempotent operation.
The interface differs from UNIX in the following aspects
1. Except for create the operations are idempotent. This allows for atleastonce RPC semantics
2. Stateless servers. Stateless servers can be restarted after a failure and resume operations without the need for the server or client to restore any state..
Model file service architecture
Figure 8.5
Lookup AddName UnName GetNames
Client computer
Server computer Directory service
Application Application program program
Client module
Instructors Guide for Coulouris, Dollimore and Kindberg
Pearson Education 2005
Edn. 4
Read
Write
Create Delete GetAttributes
SetAttributes
Distributed Systems: Concepts and Design
Flat file service
Server operations for the model file service
Figures 8.6 and 8.7
Flat file service
Directory service
ReadFileId, i, nData
LookupDir, NameFileId
WriteFileId, i, Data
AddNameDir, Name, File
CreateFileId DeleteFileId
UnNameDir, Name
GetAttributesFileIdAttr SetAttributesFileId, Attr
GetNamesDir, PatternNameSeq
position of first byte position of first byte
Class Exercise B
Pathname lookup
FileId
Pathnames such as usrbintar are resolved
Show how each file operation of the program that A unique identifier for files anywhere in the
by iterative calls to lookup, one call for
you wrote in Class Exercise A would be executed
network. Similar to the remote object
each component of the path, starting with
using the operations of the Model File Service in references described in Section 4.3.3.
the ID of the root directorywhich is
Figures 8.6 and 8.7.
known in every client.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
FileId
8.2 File service architecture . . .
Access control
In UNIX the user rights are checked in the open call The operations are indicated on the next figure..
Figure 8.4
UNIX file system operations
filedesopenname, mode filedescreatname, mode
Opens an existing file with the given name.
Creates a new file with the given name.
Both operations deliver a file descriptor referencing the open file. The mode is read, write or both.
statusclosefiledes
countreadfiledes, buffer, n countwritefiledes, buffer, n
Closes the open file filedes.
poslseekfiledes, offset, whence
Moves the readwrite pointer to offset relative or absolute, depending on whence.
statusunlinkname
Removes the file name from the directory structure. If the file has no other names, it is deleted.
statuslinkname1, name2 statusstatname, buffer
Adds a new name name2 for a file name1. Gets the file attributes for file name into buffer.
Transfers n bytes from the file referenced by filedes to buffer. Transfers n bytes to the file referenced by filedes from buffer. Both operations deliver the number of bytes actually transferred and advance the readwrite pointer.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
8.2 File service architecture . . .
Access control . . .
In distributed implementations access rights checks have to be performed at the server.
A user identity has to be passed with the request. Forged identities are a possibility in this context.
If the results of an access right check was retained at the server and used for future access then the server is no more stateless.
Two approaches are possible.
8.2 File service architecture . . .
Access control . . .
Two approaches
An access check is made whenever a file name is converted to a UFID and 1 the result are encoded in the form of a capability which is returned to the
2
A user identity is submitted with every client request and access checks are performed by the server for every file operation
client for future use.
Both methods allow for stateless server implementation. AFS and NFS use the second approach.
8.2 File service architecture . . .
Directory service interface
The next figure contains a definition of the RPC interface to a directory service.
We have already mentioned that the main purpose for the directory service is to translate file name to UFID.
Each directory is stored as a conventional file with a UFID.
For each operation on each directory the UFID for that directory is required.
Figure 8.7
Directory service operations
LookupDir, NameFileIdthrows NotFound
Locates the text name in the directory and returns the relevant UFID. If Name is not in the directory, throws an exception.
AddNameDir, Name, FileIdthrows NameDuplicate
If Name is not in the directory, adds Name, File to the directory and updates the files attribute record.
If Name is already in the directory: throws an exception.
UnNameDir, Namethrows NotFound
If Name is in the directory: the entry containing Name is removed from the directory.
If Name is not in the directory: throws an exception.
GetNamesDir, PatternNameSeq
Returns all the text names in the directory that match the regular expression Pattern.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
8.2 File service architecture . . .
Directory service interface
The first operation is the Lookup operation: given a text name it returns a UFID.
They are two operations to alter directories: AddName and UnName
AddName add an entry an increment the reference count field in the file attribute record
UnName removes an entry from the directory and decremens the reference count
If the reference count reaches zero the file directory is removed.
GetName is for the user to perform pattern matching operation. A file can be identify with incomplete name.
8.2 File service architecture . . .
Hierarchic file system
UNIX is such a system.
Directories are organized in a tree structure
Any file or directory can be accessed using a pathname.
In UNIX files can have several names and they can be in the same or different directories
Such a file system i.e. a UNIX like file system can be implemented by the client module using the flat file and directory services
In a hierarchy directory service the file attributes associated with files should include a type field to distinguish between ordinary files and directories.
8.2 File service architecture . . .
File Groups
A file group is a collection of file located on a given server. A file cannot change the group to which it belongs.
This notion is useful to move collection of files.
In a distributed system file groups support the allocation of files to file server in larger logical units and enable the service to be implemented with files stored on several servers.
In such a case the UFID must include a file group identifier File group identifiers must be unique in the distributed system.
File group identifier: :
IP address 32bits Date16 bits
8.3 Case study: Sun Network File System
This file system developed by Sun Microsystems follows the abstract model developed in the previous section.
Lets see more of the characteristics of that file system.
Case Study: Sun NFS
An industry standard for file sharing on local networks since the 1980s An open standard with clear and simple interfaces
Closely follows the abstract file service model defined above
Supports many of the design requirements already mentioned:transparency
heterogeneity
efficiency
fault tolerance Limited achievement of:
concurrencyreplicationconsistencysecurity
Instructors Guide for Coulouris, Dollimore and Kindberg
Pearson Education 2005
Distributed Systems: Concepts and Design
Edn. 4
Other
file system
NFS architecture
Figure 8.8
Application NFS program Client
UNIX system calls
UNIX kernel
Virtual file system
Virtual file system
Operations on local files
Operations on
program
program
UNIX file system
remote files
NFS UNIX Client file
Client computer
Server computer
Application Application
Kernel
Application program
NFS client
NFS server
system
Client computer
NFS protocol
remote operations
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
NFS architecture:
does the implementation have to be in the system kernel?
No:
there are examples of NFS clients and servers that run at applicationlevel as libraries or processes e.g. early Windows and MacOS implementations, current PocketPC, etc.
But, for a Unix implementation there are advantages: Binary code compatibleno need to recompile applications
Standard system calls that access remote files can be routed through the NFS client module by the kernel
Shared cache of recentlyused blocks at client
Kernellevel server can access inodes and file blocks directly
Instructors Guide for Coulouris, Dollimore and Kindberg
Distributed Systems: Concepts and Design
Edn. 4
n
Sitfth tikdfthtiti
but a privileged root applica
P
e
ars
on E
m could do almost the same.
t
i
o
p
duc
r
a
tion
20
0
5
o
g
r
a
NFS server operations simplified
Figure 8.9
readfh, offset, countattr, data writefh, offset, count, dataattr createdirfh, name, attrnewfh, attr removedirfh, name status
Model flat file service
getattrfhattr
setattrfh, attrattr
lookupdirfh, namefh, attr
renamedirfh, name, todirfh, toname
linknewdirfh, newname, dirfh, name
readdirdirfh, cookie, countentries
symlinknewdirfh, newname, stringstatusUnNameDir, Name readlinkfhstring
mkdirdirfh, name, attrnewfh, attr
rmdirdirfh, namestatus
statfsfhfsstatsInstructors Guide for Coulouris, Dollimore and Kindberg
Pearson Education 2005
fhfile handle:
ReadFileId, i, nData
Filesystem identifier inode number inode generation
WriteFileId, i, Data CreateFileId DeleteFileId GetAttributesFileIdAttr SetAttributesFileId, Attr
Model directory service
LookupDir, NameFileId AddNameDir, Name, File
GetNamesDir, Pattern NameSeq
Distributed Systems: Concepts and Design Edn. 4
NFS access control and authentication
Stateless server, so the users identity and access rights must be checked by the server on each request.
In the local file system they are checked only on open
Every client request is accompanied by the userID and groupID
not shown in the Figure 8.9 because they are inserted by the RPC system
Server is exposed to imposter attacks unless the userID and groupID are protected by encryption
Kerberos has been integrated with NFS to provide a stronger and more comprehensive security solution
Kerberos is described in Chapter 7. Integration of NFS with Kerberos is covered later in this chapter.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
Mount service
Mount operation:
mountremotehost, remotedirectory, localdirectory
Server maintains a table of clients who have mounted filesystems at that server
Each client maintains a table of mounted file systems holding:
IP address, port number, file handle
Hard versus soft mounts
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
Local and remote file systems accessible on an NFS client Figure 8.10
big
. . .
jim
ann jane joe
Server 1 root
Client root
Server 2 root
export
. . .
vmunix usr
nfs
people jon bob
Remote mount
students x
staff
Remote mount
users
Note: The file system mounted at usrstudents in the client is actually the subtree located at exportpeople in Server 1; the file system mounted at usrstaff in the client is actually the subtree located at nfsusers in Server 2.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
NFS optimizationserver caching
Similar to UNIX file caching for local files:
pages blocks from disk are held in a main memory buffer cache until the space is required for newer pages. Readahead and delayedwrite optimizations
For local files, writes are deferred to next sync event 30 second intervals
Works well in local context, where files are always accessed through the local cache, but
in the remote case it doesnt offer necessary synchronization guarantees to clients .
NFS v3 servers offers two strategies for updating the disk:
writethroughaltered pages are written to disk as soon as they are received at the
server. When a write RPC returns, the NFS client knows that the page is on the disk.
delayed commitpages are held only in the cache until a commit call is received for the relevant file. This is the default mode used by NFS v3 clients. A commit is issued by the client whenever a file is closed.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
NFS optimizationclient caching
applications that share files with NFSServer caching does nothing to reduce RPC traffic between client and server
further optimization is essential to reduce server load in large networks
NFS client module caches the results of read, write, getattr, lookup and readdir operations
synchronization of file contents onecopy semantics is not guaranteed when two or more clients are sharing the same file.
Timestampbased validity check
reduces inconsistency, but doesnt eliminate itvalidity condition for cache entries at the client:
TTct v TmclientTmserver
t is configurable per file but is typically set to
t freshness guarantee
Tc time when cache entry was last
3 seconds for files and 30 secs. for directories
it remains difficult to write distributed Tm
validated
time when block was last updated at server
Instructors Guide for Coulouris, Dollimore and Kindberg
Pearson Education 2005
T current time
Distributed Systems: Concepts and Design Edn. 4
Other NFS optimizations
Sun RPC runs over UDP by default can use TCP if required
Uses UNIX BSD Fast File System with 8kbyte blocks
reads and writes can be of any size negotiated between client and server
the guaranteed freshness interval t is set adaptively for individual files to reduce gettattr calls needed to update Tm
file attribute information including Tm is piggybacked in replies to all file requests
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
NFS summary 1 An excellent example of a simple, robust, highperformance
distributed service.
Achievement of transparencies See section 1.4.7:
Access: Excellent; the API is the UNIX system call interface for both local and remote files.
Location: Not guaranteed but normally achieved; naming of filesystems is controlled by client mount operations, but transparency can be ensured by an appropriate system configuration.
Concurrency: Limited but adequate for most purposes; when read write files are shared concurrently between clients, consistency is not perfect.
Replication: Limited to readonly file systems; for writable files, the SUN Network Information Service NIS runs over NFS and is used to replicate essential system files, see Chapter 14.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
NFS summary 2 Achievement of transparencies continued:
Failure: Limited but effective; service is suspended if a server fails. Recovery from failures is aided by the simple stateless design.
Mobility: Hardly achieved; relocation of files is not possible, relocation of filesystems is possible, but requires updates to client configurations.
Performance: Good; multiprocessor servers achieve very high performance, but for a single filesystem its not possible to go beyond the throughput of a multiprocessor server.
Scaling: Good; filesystems file groups may be subdivided and allocated to separate servers. Ultimately, the performance limit is determined by the load on the server holding the most heavilyused filesystem file group.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
8.4 Case study: The Andrew File System
Like NFS, AFS provide transparent access to remote shared file for UNIX programs running on workstations.
Access to AFS files is via the normal UNIX file primitives: no need for existing UNIX programs accessing AFS files to be modified or recompiled.
AFS is compatible with NFS: AFS servers holds the local UNIX files but the filing system in the servers are NFS based.
AFS differs greatly from NFS
8.4 Case study: The Andrew File System . . .
AFS has two unusual aspects in its design
AFS differs greatly from NFS
Scalability has been chosen has the most important design goal. AFS is design to perform well with a larger number of users.
The key strategy to achieve scalability is caching. Caching whole file at clients nodes.
Whole file serving: The entire contents of directories and files are transmitted to client computers by AFS servers
Whole file caching: Once a copy of a file or a chunck has been transferred to a client computer it is stored in a cache on the local disk.
Local copies are used to satisfy clients open requests whenever possible
8.4 Case study: The Andrew File System . . .
A simple scenario
A user process on a client computer issues an open system call for a file in the shared file space. This is the first such request.
1
There is not a current copy of the file on the local disk in the local cache.
The server holding the file is located and a request to get a copy of the file is sent.
Next the copy is stored in the local UNIX file system in the client computer the copy is then opened and the resulting UNIX descriptor is returned to the client.
2
Subsequent read and write and other operations on the file by processes in the client computer are applied to the local copy.
3
8.4 Case study: The Andrew File System . . .
A simple scenario . . .
When the process in the client issues a close system call, if the local copy has been updated, the content is send back to the server.
4
The server update the content of the file and put a timestamp
The local copy on the client computer is retained in case it is needed again by a userlevel process on the same workstation.
8.4 Case study: The Andrew File System . . .
A brief discussion on performance of AFS
For files that are rarely updated like libraries or files that are used by a single user like a home directory, the cached version will remain valid for long periods.
1
These classes of file accounts for most of the access.
The l ocal cache can be allocated sufficient disk space at each workstation..
This is normally sufficient to hold the file used by one user. 2
8.4 Case study: The Andrew File System . . .
A brief discussion on performance of AFS
The design strategy is based on some assumptions about average size and maximum size for file and locality fo reference to files in UNIX system.
3
This is based on observation.
Files are small: less then 10kb
Read operations are much more common than write Sequential access is common random access is rare Most file are read and written by one user
Files are referenced in burst. If one file is used there is a high probability it will be used again soon
8.4 Case study: The Andrew File System . . .
A brief discussion on performance of AFS
AFS w0rks best with files we have described. An important class is missing Databases.
4
The designers of AFS have excluded the provision of storage facilities for DB in their design.
The following questions cannot be answered easily from the previous scenario.
8.4 Case study: The Andrew File System . . .
Some questions to answer
How does AFS gain control when an open or close system call referring to a file in the shared file space is issued by a client?
How is the server holding the required file located? What space is allocated for cache space in workstations?
How does AFS ensure that cached copies of files are up to date when file may be updated by many clients?
8.4.1 Implementation
AFS is implemented as two software component that exist as UNIX process Vice and Venus
Vice
Vice is the name given to the server software that runs at userlevel UNIX process in each server
Venus
Venus is a userlevel process that runs in each client computer and corresponds to the client module in our abstract model
8.4.1 Implementation
The files available to user processes on workstations are either local or shared.
Local
Local files are handled as normal UNIX files. They are stored on local disk workstations and are only available to local user processes.
Shared
Shared files are stored on servers. Copies of them are cached on local disk of workstations.
The name space seen by the user is a conventional UNIX directory hiearchy with a specific subtree cmu
cmu is the subtree containing all the shared files.
8.4.1 Implementation . . .
Shared . . .
Local files are used only for temporary files and processes that are essential for the workstation startup.
Other standard UNIX file are implemented as symbolic links from local directories to file held in the shared space.
User directories are in the shared space. This allow the user to access their file from any workstation.
More on implementation to come:What about open and close
8.4.1 Implementation . . .
Kernel
The kernel is a modified version of BSD. The modifications have been to interceept of the open and close systems calls and other file systems calls
Partition
Each local disk contains a partition that is used as the cache. Venus manages the cache.
8.4.1 Implementation . . .
File partition
One of the file partition at each workstation is used for the cache. Venus manages the cache removing old files to make room etc
Volume
Files are grouped in volume to facilitate location and movement. For example each user personal file are usually located on separate volume.
Each file and directory in the shared file space is identified by a unique 96 bits file identifier fid. The Venus processes translate the pathnames issued by clients to fids.
8.4.1 Implementation . . .
The next figure describes the action taken by Vice Venu and the UNIX kernel when a user process issues each of the system calls mentioned earlier.
The callback promise is a mechanism to ensure that cached copies of files are updated when another client closes the same file after updating it.
Figure 8.14
Implementation of file system calls in AFS
User process
UNIX kernel
Venus Net
Vice
openFileName, mode
If FileName refers to a file in shared file space, pass the request to Venus.
Check list of files in local cache. If not present or there is no valid callback promise, send a request for the file to the Vice server that is custodian of the volume containing the file.
readFileDescriptor, Buffer, length
Perform a normal UNIX read operation on the local copy.
writeFileDescriptor, Buffer, length
Perform a normal
closeFileDescriptor
Close the local copy and notify Venus that the file has been closed.
Open the local file and return the file
descriptor to the application.
UNIX write operation on the local copy.
Place the copy of the file in the local file system, enter its local name in the local cache list and return the local name to UNIX.
file and a callback promise to the workstation. Log the callback promise.
If the local copy has
been changed, send a
Replace the file contents and send a
copy to the Vice server
that is the custodian of
Instructors Guide for Coulouris, Dollimore and KinthdbeerfgileD.istributed Systems: Concepts and Design Edn. 4
Pearson Education 2005
callback to all other clients holding callback
Transfer a copy of the
promises on the file.
8.4.2 Cache consistency
When Vice supplies a copy of a file to a Venus process it also provides a token the callback promise.
This token is issued by the Vice server custodian of the file to guarantee it will notify the Venus process when any other client modifies the file. This token has two state valid or cancelled.
When a server performs a request to update a file it notifies all of the Venus processes to which it has issued callback promises.
A callback is a remote procedure call from the server to a Venus process. When the Venus process receives a callback it sets the callback promise token of the relevant file to cancelled
8.4.2 Cache consistency
When Vice supplies a copy of a file to a Venus process it also provides a token the callback promise.
This token is issued by the Vice server custodian of the file to guarantee it will notify the Venus process when any other client modifies the file. This token has two state valid or cancelled.
When a server performs a request to update a file it notifies all of the Venus processes to which it has issued callback promises.
A callback is a remote procedure call from the server to a Venus process. When the Venus process receives a callback it sets the callback promise token of the relevant file to cancelled
Recent advances in file services
NFS enhancements
WebNFSNFS server implements a weblike service on a wellknown port. Requests use a public file handle and a pathnamecapable variant of lookup. Enables applications to access NFS servers directly, e.g. to read a portion of a large file.
Onecopy update semantics Spritely NFS, NQNFSInclude an open operation and maintain tables of open files at servers, which are used to prevent multiple writers and to generate callbacks to clients notifying them of updates. Performance was improved by reduction in gettattr traffic.
Improvements in disk storage organisation
RAIDimproves performance and reliability by striping data redundantly across several disk drives
Logstructured file storageupdated pages are stored contiguously in memory and committed to disk in large contiguous blocks1 Mbyte.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
File maps are modified whenever an update occurs. Garbage collection
New design approaches 1
Distribute file data across several servers Exploits highspeed networks ATM, Gigabit Ethernet
Layered approach, lowest level is like a distributed virtual disk Achieves scalability even for a single heavilyused file
Serverless architecture
Exploits processing and disk resources in all available network nodes Service is distributed at the level of individual files
Examples:
xFS section 8.5: Experimental implementation demonstrated a substantial performance gain over NFS and AFS
Frangipani section 8.5: Performance similar to local UNIX file access
Tiger Video File System see Chapter 15
Peertopeer systems: Napster, OceanStore UCB, Farsite MSR, Publius
ATT researchsee web for documentation on these very recent
systems
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
Pearson Education 2005
New design approaches2
Replicated readwrite files
High availability
Disconnected working
reintegration after disconnection is a major problem if conflicting updates
have ocurred
Examples:
Bayou system Section 14.4.2 Coda system Section 14.4.3
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
Summary
Sun NFS is an excellent example of a distributed service designed to meet many important design requirements
Effective client caching can produce file service performance equal to or better than local file systems
Consistency versus update semantics versus fault tolerance remains an issue
Most client and server failures can be masked
Superior scalability can be achieved with wholefile serving
Andrew FS or the distributed virtual disk approach Future requirements:
support for mobile users, disconnected operation, automatic reintegration Cf. Coda file system, Chapter 14
support for data streaming and quality of service Cf. Tiger file system, Chapter 15
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
Exercise A solution
Write a simple C program to copy a file using the UNIX file system operations shown in Figure 8.4.
define BUFSIZE 1024
define READ 0
define FILEMODE 0644
void copyfilechar oldfile, char newfile
char bufBUFSIZE; int i,n1, fdold, fdnew;
iffdoldopenoldfile, READ0
fdnewcreatnewfile, FILEMODE;
while n0
nreadfdold, buf, BUFSIZE;
closefdold; closefdnew;
ifwritefdnew, buf, n0 break;
else printfCopyfile: couldnt open file: s n, oldfile;
mainint argc, char argv
copyfileargv1, argv2;
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
Exercise B solution
fdoldopenusrincludeglob.h, READ
ifwritefdnew, buf, n0 break;
FileIdcreate AddNameRoot, foo, FileId SetAttributesFileId, attributes
remote invocationremote invocationremote invocation
closefdold; closefdnew;
server operations for: copyfileusrincludeglob.h, foo
Client module actions:
FileIdLookupRoot, usrremote invocation
FileIdLookupFileId, includeremote invocation Show how each file operation of the program that you wrote in Class Exercise A would be executed
FileIdLookupFileId, glob.hremote invocation using the operations of the Model File Service in Figures 8.6 and 8.7.
iffdoldopenoldfile, READ0fdnewcreatnewfile, FILEMODE; while n0
client module makes an entry in an open files table with fileFileId, modeREAD, and RWpointer0. It returns the table row number as the value for fdold
nreadfdold, buf, BUFSIZE;
fdnewcreatfoo, FILEMODE
Client module actions:
client module makes an entry in its openfiles table with
fileFileId, modeWRITE, and RWpointer0. It returns
the table row number as the value for fdnew nreadfdold, buf, BUFSIZE
Client module actions:
Readopenfilesfdold.file, openfilesfdold.RWpointer, BUFSIZE
remote invocation increment the RWpointer in the openfiles table by BUFSIZE
and assign the resulting array of data to bufPearson Education 2005
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
Other
file system
Figure 8.8
NFS architecture
UNIX system calls
UNIX kernel
UNIX kernel
Virtual file system Local
Virtual file system
UNIX file system
NFS client
NFS UNIX
Client computer
Server computer
Application Application program program
Remote
Instructors Guide for Coulouris, Dollimore and Kindberg
Pearson Education 2005
NFS
server file system
protocol
Distributed Systems: Concepts and Design Edn. 4
Figure 8.9
NFS server operations simplified1
lookupdirfh, namefh, attr
Returns file handle and attributes for the file name in the directory dirfh.
createdirfh, name, attrnewfh, attr
Creates a new file name in directory dirfh with attributes attr and returns the new file handle and attributes.
removedirfh, name status getattrfhattr
Removes file name from directory dirfh.
Returns file attributes of file fh. Similar to the UNIX stat system
setattrfh, attrattr
readfh, offset, countattr, data writefh, offset, count, dataattr renamedirfh, name, todirfh, toname
Sets the attributes mode, user id, group id, size, access time and modify time of a file. Setting the size to 0 truncates the file.
status
Changes the name of file name in directory dirfh to toname in directory to todi.rfh
linknewdirfh, newname, dirfh, namestatus
Creates an entry newname in the directory newdirfh which refers to file name in the directory dirfh.
call.
Returns up to count bytes of data from a file starting at offset. Also returns the latest attributes of the file.
Writes count bytes of data to a file starting at offset. Returns the attributes of the file after the write has taken place.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
Continues on next slide
Figure 8.9
NFS server operations simplified2
symlinknewdirfh, newname, stringstatus
Creates an entry newname in the directory newdirfh of type symbolic link with the value string. The server does not interpret the string but makes a symbolic link file to hold it.
readlinkfhstring
Returns the string that is associated with the symbolic link file identified by fh.
mkdirdirfh, name, attrnewfh, attr
Creates a new directory name with attributes attr and returns the new file handle and attributes.
rmdirdirfh, namestatus
Removes the empty directory name from the parent directory dirfh. Fails if the directory is not empty.
readdirdirfh, cookie, countentries
Returns up to count bytes of directory entries from the directory dirfh. Each entry contains a file name, a file handle, and an opaque pointer to the next directory entry, called a cookie. The cookie is used in subsequent readdir calls to start reading from the following entry. If the value of cookie is 0, reads from the first entry in the directory.
statfsfhfsstats
Returns file system information such as block size, number of free blocks and so on for the file system containing a file fh.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
Figure 8.10
Local and remote file systems accessible on an NFS client
big
. . .
jim
ann jane joe
Server 1 root
Client root
Server 2 root
export
. . .
vmunix
usr
nfs
people jon bob
Remote mount
students
x staff
Remote mount
users
Note: The file system mounted at usrstudents in the client is actually the subtree located at exportpeople in Server 1; the file system mounted at usrstaff in the client is actually the subtree located at nfsusers in Server 2.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
Figure 8.11
Distribution of processes in the Andrew File System
Workstations
Servers
User Venus program
Vice UNIX kernel
UNIX kernel
User Venus program
Network
UNIX kernel
User Venus program
Vice UNIX kernel
UNIX kernel
Instructors Guide for Coulouris, Dollimore and Kindberg
Pearson Education 2005
Distributed Systems: Concepts and Design Edn. 4
Figure 8.12
File name space seen by clients of AFS
tmp bin
. . .
vmunix
cmu
Local
Shared
Symbolic links
root
Instructors Guide for Coulouris, Dollimore and Kindberg
Pearson Education 2005
Distributed Systems: Concepts and Design
Edn. 4
bin
Figure 8.13
System call interception in AFS
User program
Venus
UNIX file system calls
Nonlocal file operations
Workstation
UNIX kernel
UNIX file system
Local disk
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
Venus
Figure 8.14
Implementation of file system calls in AFS
User process
UNIX kernel
Venus Net
Vice
openFileName, mode
If FileName refers to a file in shared file space, pass the request to Venus.
Check list of files in local cache. If not present or there is no valid callback promise, send a request for the file to the Vice server that is custodian of the volume containing the file.
readFileDescriptor, Buffer, length
Perform a normal UNIX read operation on the local copy.
writeFileDescriptor, Buffer, length
Perform a normal
closeFileDescriptor
Close the local copy and notify Venus that the file has been closed.
Open the local file and return the file
descriptor to the application.
UNIX write operation on the local copy.
Place the copy of the file in the local file system, enter its local name in the local cache list and return the local name to UNIX.
file and a callback promise to the workstation. Log the callback promise.
If the local copy has
been changed, send a
Replace the file contents and send a
copy to the Vice server
that is the custodian of
Instructors Guide for Coulouris, Dollimore and KinthdbeerfgileD.istributed Systems: Concepts and Design Edn. 4
Pearson Education 2005
callback to all other clients holding callback
Transfer a copy of the
promises on the file.
Figure 8.15
The main components of the Vice service interface
Fetchfidattr, data Storefid, attr, data
Returns the attributes status and, optionally, the contents of file identified by the fid and records a callback promise on it.
Createfid Removefid SetLockfid, mode
Creates a new file and records a callback promise on it. Deletes the specified file.
ReleaseLockfid RemoveCallbackfid
Unlocks the specified file or directory.
BreakCallbackfid
This call is made by a Vice server to a Venus process. It cancels the callback promise on the relevant file.
Updates the attributes and optionally the contents of a specified file.
Sets a lock on the specified file or directory. The mode of the lock may be shared or exclusive. Locks that are not removed expire after 30 minutes.
Informs server that a Venus process has flushed a file from its cache.
Instructors Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4Pearson Education 2005
Reviews
There are no reviews yet.