PostgreSQL File Manager
>>
PostgreSQL File Manager
PostgreSQL File Manager
Relations as Files
File Descriptor Pool
File Manager
COMP9315 21T1 PG File Manager [0/15]
>>
PostgreSQL File Manager
PostgreSQL uses the following file organisation
COMP9315 21T1 PG File Manager [1/15]
<< >>
PostgreSQL File Manager (cont)
Components of storage subsystem:
mapping from relations to files (RelFileNode)
abstraction for open relation pool (storage/smgr)
functions for managing files (storage/smgr/md.c)
file-descriptor pool (storage/file)
PostgreSQL has two basic kinds of files:
heap files containing data (tuples)
index files containing index entries
Note: smgr designed for many storage devices; only disk handler provided
COMP9315 21T1 PG File Manager [2/15]
<< >>
Relations as Files
PostgreSQL identifies relation files via their OIDs.
The core data structure for this is RelFileNode:
typedef struct RelFileNode {
OidspcNode;// tablespace
OiddbNode; // database
OidrelNode;// relation
} RelFileNode;
Global (shared) tables (e.g. pg_database) have
spcNode == GLOBALTABLESPACE_OID
dbNode == 0
COMP9315 21T1 PG File Manager [3/15]
<< >>
Relations as Files (cont)
The relpath function maps RelFileNode to file:
char *relpath(RelFileNode r)// simplified
{
char *path = malloc(ENOUGH_SPACE);
if (r.spcNode == GLOBALTABLESPACE_OID) {
/* Shared system relations live in PGDATA/global */
Assert(r.dbNode == 0);
sprintf(path, %s/global/%u,
DataDir, r.relNode);
}
else if (r.spcNode == DEFAULTTABLESPACE_OID) {
/* The default tablespace is PGDATA/base */
sprintf(path, %s/base/%u/%u,
DataDir, r.dbNode, r.relNode);
}
else {
/* All other tablespaces accessed via symlinks */
sprintf(path, %s/pg_tblspc/%u/%u/%u, DataDir
r.spcNode, r.dbNode, r.relNode);
}
return path;
}
COMP9315 21T1 PG File Manager [4/15]
<< >>
File Descriptor Pool
Unix has limits on the number of concurrently open files.
PostgreSQL maintains a pool of open file descriptors:
to hide this limitation from higher level functions
to minimise expensive open() operations
File names are simply strings: typedef char *FileName
Open files are referenced via: typedef int File
A File is an index into a table of virtual file descriptors.
COMP9315 21T1 PG File Manager [5/15]
<< >>
File Descriptor Pool (cont)
Interface to file descriptor (pool):
File FileNameOpenFile(FileName fileName,
int fileFlags, int fileMode);
// open a file in the database directory ($PGDATA/base/)
File OpenTemporaryFile(bool interXact);
// open temp file; flag: close at end of transaction?
void FileClose(File file);
void FileUnlink(File file);
intFileRead(File file, char *buffer, int amount);
intFileWrite(File file, char *buffer, int amount);
intFileSync(File file);
long FileSeek(File file, long offset, int whence);
intFileTruncate(File file, long offset);
Analogous to Unix syscalls open(), close(), read(), write(), lseek(),
COMP9315 21T1 PG File Manager [6/15]
<< >>
File Descriptor Pool (cont)
Virtual file descriptors (Vfd)
physically stored in dynamically-allocated array
also arranged into list by recency-of-use
VfdCache[0] holds list head/tail pointers.
COMP9315 21T1 PG File Manager [7/15]
<< >>
File Descriptor Pool (cont)
Virtual file descriptor records (simplified):
typedef struct vfd
{
s_shortfd;// current FD, or VFD_CLOSED if none
u_shortfdstate; // bitflags for VFDs state
File nextFree;// link to next free VFD, if in freelist
File lruMoreRecently; // doubly linked recency-of-use list
File lruLessRecently;
long seekPos; // current logical file position
char *fileName; // name of file, or NULL for unused VFD
// NB: fileName is mallocd, and must be freed when closing the VFD
intfileFlags; // open(2) flags for (re)opening the file
intfileMode;// mode to pass to open(2)
} Vfd;
COMP9315 21T1 PG File Manager [8/15]
<< >>
File Manager
Reminder: PostgreSQL file organisation
COMP9315 21T1 PG File Manager [9/15]
<< >>
File Manager (cont)
PostgreSQL stores each table
in the directory PGDATA/pg_database.oid
often in multiple files (aka forks)
COMP9315 21T1 PG File Manager [10/15]
<< >>
File Manager (cont)
Data files (Oid, Oid.1, ):
sequence of fixed-size blocks/pages (typically 8KB)
each page contains tuple data and admin data (see later)
max size of data files 1GB (Unix limitation)
COMP9315 21T1 PG File Manager [11/15]
<< >>
File Manager (cont)
Free space map (Oid_fsm):
indicates where free space is in data pages
free space is only free after VACUUM
(DELETE simply marks tuples as no longer in use xmax)
Visibility map (Oid_vm):
indicates pages where all tuples are visible
(visible = accessible to all currently active transactions)
such pages can be ignored by VACUUM
COMP9315 21T1 PG File Manager [12/15]
<< >>
File Manager (cont)
The magnetic disk storage manager (storage/smgr/md.c)
manages its own pool of open file descriptors (Vfds)
may use several Vfds to access data, if several forks
manages mapping from PageID to file+offset.
PostgreSQL PageID values are structured:
typedef struct
{
RelFileNode rnode;// which relation/file
ForkNumberforkNum;// which fork (of reln)
BlockNumber blockNum; // which page/block
} BufferTag;
COMP9315 21T1 PG File Manager [13/15]
<< >>
File Manager (cont)
Access to a block of data proceeds (roughly) as follows:
// pageID set from pg_catalog tables
// buffer obtained from Buffer pool
getBlock(BufferTag pageID, Buffer buf)
{
Vfd vf;off_t offset;
(vf, offset) = findBlock(pageID)
lseek(vf.fd, offset, SEEK_SET)
vf.seekPos = offset;
nread = read(vf.fd, buf, BLOCKSIZE)
if (nread < BLOCKSIZE) … we have a problem}BLOCKSIZE is a global configurable constant (default: 8192)COMP9315 21T1 PG File Manager [14/15]<< File Manager (cont)findBlock(BufferTag pageID) returns (Vfd, off_t){ offset = pageID.blockNum * BLOCKSIZE fileName = relpath(pageID.rnode) if (pageID.forkNum > 0)
fileName = fileName+.+pageID.forkNum
if (fileName is not in Vfd pool)
fd = allocate new Vfd for fileName
else
fd = use Vfd from pool
if (pageID.forkNum > 0) {
offset = offset (pageID.forkNum*MAXFILESIZE)
}
return (fd, offset)
}
COMP9315 21T1 PG File Manager [15/15]
Produced: 28 Feb 2021
Reviews
There are no reviews yet.