[SOLVED] CS >>

$25

File Name: CS_>>.zip
File Size: 47.1 KB

5/5 - (1 vote)

>>
Sorting

The Sort Operation
Two-way Merge Sort
Comparison for Sorting
Cost of Two-way Merge Sort
n-Way Merge Sort
Cost of n-Way Merge Sort
Sorting in PostgreSQL
COMP9315 21T1 Sorting [0/14]
>>
The Sort Operation

Sorting is explicit in queries only in the order by clause

select * from Students order by name;
Sorting is used internally in other operations:
eliminating duplicate tuples for projection
ordering files to enhance select efficiency
implementing various styles of join
forming tuple groups in group by
Sort methods such as quicksort are designed for in-memory data.
For large data on disks, need external sorts such as merge sort.
COMP9315 21T1 Sorting [1/14]
<< >>
Two-way Merge Sort

Example:

COMP9315 21T1 Sorting [2/14]
<< >>
Two-way Merge Sort (cont)

Requires at least three in-memory buffers:

Assumption: cost of Merge operation on two in-memory buffers 0.
COMP9315 21T1 Sorting [3/14]
<< >>
Comparison for Sorting

Above assumes that we have a function to compare tuples.
Needs to understand ordering on different data types.
Need a function tupCompare(r1,r2,f) (cf. Cs strcmp)

int tupCompare(r1,r2,f)
{
if (r1.f < r2.f) return -1; if (r1.f > r2.f) return 1;
return 0;
}
Assume =, <, > are available for all attribute types.
COMP9315 21T1 Sorting [4/14]
<< >>
Comparison for Sorting (cont)

In reality, need to sort on multiple attributes and ASC/DESC, e.g.

example multi-attribute sort
select * from Students
order by age desc, year_enrolled
Sketch of multi-attribute sorting function

int tupCompare(r1,r2,criteria)
{
foreach (f,ord) in criteria {
if (ord == ASC) {
if (r1.f < r2.f) return -1; if (r1.f > r2.f) return 1;
}
else {
if (r1.f > r2.f) return -1;
if (r1.f < r2.f) return 1;} } return 0;}COMP9315 21T1 Sorting [5/14]<< >>
Cost of Two-way Merge Sort

For a file containing b data pages:
require ceil(log2b) passes to sort,
each pass requires b page reads, b page writes
Gives total cost: 2.b.ceil(log2b)
Example: Relation with r=105 and c=50 b=2000 pages.
Number of passes for sort: ceil(log22000) = 11
Reads/writes entire file 11 times! Can we do better?
COMP9315 21T1 Sorting [6/14]
<< >>
n-Way Merge Sort

Initial pass uses: B total buffers

Reads B pages at a time, sorts in memory, writes out in order
COMP9315 21T1 Sorting [7/14]
<< >>
n-Way Merge Sort (cont)

Merge passes use: n = B-1 input buffers, 1 output buffer

COMP9315 21T1 Sorting [8/14]
<< >>
n-Way Merge Sort (cont)

Method:

// Produce B-page-long runs
for each group of B pages in Rel {
read B pages into memory buffers
sort group in memory
write B pages out to Temp
}
// Merge runs until everything sorted
numberOfRuns = ceil(b/B)
while (numberOfRuns > 1) {
// n-way merge, where n=B-1
for each group of n runs in Temp {
merge into a single run via input buffers
write run to newTemp via output buffer
}
numberOfRuns = ceil(numberOfRuns/n)
Temp = newTemp // swap input/output files
}

COMP9315 21T1 Sorting [9/14]
<< >>
Cost of n-Way Merge Sort

Consider file where b = 4096, B = 16 total buffers:
pass 0 produces 256 16-page sorted runs
pass 1
performs 15-way merge of groups of 16-page sorted runs
produces 18 240-page sorted runs (17 full runs, 1 short run)
pass 2
performs 15-way merge of groups of 240-page sorted runs
produces 2 3600-page sorted runs (1 full run, 1 short run)
pass 3
performs 15-way merge of groups of 3600-page sorted runs
produces 1 4096-page sorted runs
(cf. two-way merge sort which needs 11 passes)
COMP9315 21T1 Sorting [10/14]
<< >>
Cost of n-Way Merge Sort (cont)

Generalising from previous example
For b data pages and B buffers
first pass: read/writes b pages, gives b0 = ceil(b/B) runs
then need ceil(lognb0) passes until sorted, where n = B-1
each pass reads and writes b pages (i.e. 2.b page accesses)

Cost = 2.b.(1 + ceil(lognb0)), where b0 and n are defined above
COMP9315 21T1 Sorting [11/14]
<< >>
Sorting in PostgreSQL

Sort uses a merge-sort (from Knuth) similar to above:
backend/utils/sort/tuplesort.c
include/utils/sortsupport.h
Tuples are mapped to SortTuple structs for sorting:
containing pointer to tuple and sort key
no need to reference actual Tuples during sort
unless multiple attributes used in sort
If all data fits into memory, sort using qsort().
If memory fills while reading, form runs and do disk-based sort.
COMP9315 21T1 Sorting [12/14]
<< >>
Sorting in PostgreSQL (cont)

Disk-based sort has phases:
divide input into sorted runs using HeapSort
merge using N buffers, one output buffer
N = as many buffers as workMem allows
Described in terms of tapes (tape sorted run)
Implementation of tapes: backend/utils/sort/logtape.c
COMP9315 21T1 Sorting [13/14]
<< Sorting in PostgreSQL (cont)Sorting comparison operators are obtained via catalog (in Type.o): // gets pointer to function via pg_operatorstruct Tuplesortstate { … SortTupleComparator … };// returns negative, zero, positiveApplySortComparator(Datum datum1, bool isnull1,Datum datum2, bool isnull2,SortSupport sort_helper);Flags in SortSupport indicate: ascending/descending, nulls-first/last. ApplySortComparator() is PostgreSQL’s version of tupCompare() COMP9315 21T1 Sorting [14/14]Produced: 1 Mar 2021

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS >>
$25