RESEARCHCONTRIBUTIONS Algorithms and
Data Structures
Daniel Sleator Editor
Practical In-Place Merging BING-CHAOHUANGand MICHAELA. LANGSTON
general, this approach allows the user to employ one block as an internal buffer to aid in rearrang:ing or oth- erwise manipulating the other blocks in constant extra space. Since only the contents of the buffer and the relative order of the blocks need ever be out of se- quence, linear time is sufficient to sort both the buffer and the blocks (each sort involves O(A) keys) and thereby complete the merge. As we shall show, two major factors which make our scheme so much more straightforward and practical than previously reported attempts to solve this problem are these: 1) we rear- range blocks before we initiate a merging phase and 2) we efficiently pass the internal buffer across the list so as to minimize unnecessary record movement.
Given todays low cost of computer memory compo- nents, the greatest practical significance of our result may well lie in the processing of large external files. For example, notice that in-place merging can be viewed as an efficient means for increasing the effec- tive size of internal memory when merging or merge- sorting external files. This may result in a substantial reduction in the overall elapsed time for file processing whenever the extra space can be utilized for more buff- ers to increase parallelism or larger buffers to reduce the number of I/O transfers needed.
In the next section, we present an overview of the main ideas behind this new O(n) time and O(1) space merging technique, with simplifying assumptions made on the input to facilitate discussion, Section 3, which may be skipped on a first reading, contains the O(A) time and O(1) space implementation details necessary to handle arbitrary inputs. In Section 4, we formally
and empirically evaluate this new merge. We demon- strate that, for large n, its average run time exceeds that of a standard merge, free to exploit O(n) temporary extra memory cells, by less than a factor of two. The final section of this presentation consists of a collection of observations and remarks pertinent to this topic.
Wepresent a novel, yet straightfotward
ABST.RACT:
linear-time algorithm for merging two sorted lists in a fixed amount of additional space. Constant of proportionality estimates and empirical testing reveal that this procedure is reasonably competitive with merge routines free to squander unbounded additional memoy, making it particularly attractive whenever space is a critical resource.
1. INTRODUCTION
Merging is a fundamental operation in computer pro- gramming. Given two sorted lists whose lengths sum to n, the obvious methods for merging them in O(n) steps require a linear amount of extra memory as well. On the other hand, it is easy to merge in place using only a constant amount of additional space by heap-sorting, but at a cost of O(n log n) time.
Hapypily, an algorithm by Kronrod [8] solves this di- lemma by merging in both linear time and constant extra space. Kronrods seminal notions of block re- arrangements and internal buffering, which employ O(&) blocks each of size O(A), have spawned contin- ued study of this problem, including the work of Hor- vath [:I], Mannila and Ukkonen [9] and Pardo [lo]. Unfortunately, all of these schemes have been fairly complicated. More importantly, none has been shown to be very effective in practice, primarily because of their large linear-time constants of proportionalities.
Herein we present a fast and surprisingly simple algo- rithm which merges in linear time and constant extra space. We also use O(A) blocks, each of size O(A). In
This research is supported in part by the National Science Foundation under grants ECS-8403859 and MIP-8603879. A preliminary version of a paper de- scribing the work reported here was presented at the ACM-IEEE/CS Fall Joint Computer Conference held at Dallas, Texas, in October 1987.
0 1988 ACM OOOl-0782/88/0300-0348 $1.50
340
Communications ofthe ACM
March 1988 Volume 37 Number 3
Research Contributions
2. AN OVERVIEW OF THE MAIN ALGORITHM
Let L denote a list of n records containing two sublists to be merged, each in nondecreasing order. For the sake of discussion, let us suppose that n is a perfect square. Figure la illustrates such a list with n = 25. Only record keys are listed, denoted by capital letters. Subscripts are added to keep track of duplicate keys as the algorithm progresses. Let us also assume that we have been able to permute the elements of L so that & largest-keyed records are at the front of the list (their relative order there is immaterial), followed by the re- mainders of the two sublists, each of which we assume, for simplicity, now contains an integral multiple of Jt; records in nondecreasing order. Figure lb shows our example list in this format.
We now view L as a series of &blocks, each of size & We will use the leading block as an internal buffer toaidinthemerge.Ourfirststepistosortthe& -1 rightmost blocks so that their tails (rightmost elements) form a nondecreasing key sequence. (A good choice for the sort is the straight selection method as described in Knuth [i]. This technique is simple and will in this setting require only O(n) key comparisons and record exchanges.) Records within a block retain their original relative order. See Figure lc.
on a block boundary. Block i + 1 must contain one or more unmerged elements (as a result of the first step in which blocks were sorted by their tails). See Figure 2c.
We now locate the next two series of elements to be merged. This time, the first begins with the leftmost unmerged element of block i + I and terminates as before for some j 2 i. The second consists solely of the elements of block j + 1. See Figure 3a, where j = 4. We resume the merge until the tail of block j has been moved. See Figure 3b.
EEFGH
buffer
a) Locating the first two series of elements, i = 3
HH23I1I2J1AAABB 1234512312123231.?B11B, BDD -+-WV
CCEGG series 2
<:eesr2–vmergedb) Merging is partially completed/ AA12A34B61B23B12BBCCDDDCEGGEEF23G, H122132 3 721 1,AIA~A~B~B~B,B~B~C,CZD,~~Z,HzH&J, EsG2Gs E,&F,G,H, B B1 2B3D1D* 1E2E, F1 ,G2 H1 H J A , A & B B C C E G G H I I4 5 1 2 3 2 3 3 1 2-v bufferc) First two series are mergedblock 5-Vsublist 1a) Example list L, with n – 25sublist 2H~H&IeJ1 BiBgB3D,D2 E,EzF,G,H, A12A345A B B C12C323E G GFIGURE2.Merging the First Two Series of Elements–VV-Y——y-J buffer L-v-~-W-JWe continue this process of locating series of ele- ments and merging them until we reach a point where only one such series exists, which we merely shift left, leaving the buffer in the last block. See Figure 3c.A sort of the buffer completes the merge of L. See Figure 3d. remainder of sublist 2 b) Internal buffer is extractedremainder of sublist 1 HH23I1I2J, AAABBBB12B34D512D31C21C23E23G,21G1,EEFGH c) Blocks are sortedFIGURE1. Initial Rearrangement of BlocksNext, we locate two series of elements to be merged. The first series begins with the head (leftmost element) of block 2 and terminates with the tail of block i, i 2 2, where block i is the first block such that tail (i) > head (i + 1). The second series consists solely of the elements ofblock i+1.SeeFigure 2a,where i= 3.Wenow use the buffer to merge these two series. That is, we repeat- edly compare the leftmost unmerged element in the
first series to the leftmost unmerged element in the second, swapping the smaller with the leftmost buffer element. Ties are broken in favor of the first series. In general, the buffer may be broken into two pieces as we merge. See Figure 2b. We halt this process when the
tail of block i has been moved to its final position. At this point, the buffer must be in one piece, although not
AA12A34B61B23B12B12B12C321CDDlHHlJ
E3G23G
E1E21FGH 11
w-v
buffer
a) Locating the next two series of elements, j = 4
A 12A3451A2312B1231B21 B B B C C D D E E E F G2G3J,I,HzH& G,H, k/
series 1
series 2
b) Series is merged AA12A34B51B23B12B12B31C212C3,D1 DEEEFGGGH
c) Last series is shifted
buffer HzH&J,I,
AAf2A34B512B31B2,B23B12C12C31tD23D2,~EEEFGGGHHHIIJ
cl) Buffer is sorted, completing merge
FIGURE3. Completing the Merge of L
March 1988 Volume 31 Number 3
Communications of the ACM
349
ResearchContributions
O(:L)space suffices for this procedure, since the buffer was internal to the list, and since only a handful of additional pointers and counters are necessary. O(n) time suffices as well, since the block sorting, the series merg:ing and the buffer sorting each require at most linear time.
This completes our overview of the central features of this fast merging strategy, illustrating the method we use to rearrange L by blocks and to merge the elements of these blocks with the aid of the internal buffer. In the next section, we provide the interested reader with O(A) time implementation details for dealing with
lists and sublists of arbitrary sizes and for efficiently handling the initial extraction of the internal buffer.
3. IMPLEMENTATION DETAILS
For the special case in which one of the sublists to be merged has I c & elements, it is a simple matter to perform the merge efficiently without the general pro- cedure outlined in the last section. For example, sup- posing the shorter sublist begins in location 1,we could employ the block merge forward scheme suggested in [lo]. We first search the longer sublist for an insertion point For record 1. That is, we find p such that key(p) < key(l) 5 key (p + 1).Then elements indexed from 1 through p can be circularly shifted p – I places to the right, so that the rightmost element of the shorter sub- list occupies position p. (Such a shift is often termed a rotation, and can be efficiently achieved with three sublist reversals.) The first element of the shorter sub- list and all elements to its left are now in their final position. We complete the merge by iterating this oper- ation over the remainder of the list until one of the sub1ist.s is exhausted. Linear time is sufficient for thismethod since we have insured that elements of the shorter sublist are moved no more than & times, and elements of the longer sublist are moved only once. [If the larger sublist occurs first, we can, of course, use an analogous block merge backwards scheme.)In what follows, therefore, we assume that eachsublist has at least s = L&J elements. Note that(s + 1)’ > n guarantees that the total number of s-sized blocks is no more than s + 2. We begin by comparing
the right ends of the sublists, identifying s largest-keyed records which are to become the internal buffer. Let A and B, with respective sizes s1 and s2, denote these two portions ofL,where s1+s2=s.LetCdenote s- s1=sZ elements immediately to the left of A. Let D denote the shortest (possibly empty) collection of records to the immediate left of B which, if D and B were deleted from the second sublist, would leave that sublist with an integral multiple of s records. Thus we can view L as a list of s-sized blocks, except possibly for the first block
of size tl, where 0< t, I s,and for the last block of size tz, where 0 < tz c 2s. See Figure 4a.We now show how to handle the rightmost block which may have an unusual size. We swap B and C, bringing the buffer into one block. Then we use the buffer to merge C and D, giving block E of size ta. See Figure 4b. If both C and D were nonempty, tlnen the tail of E comes from either C or D, and is as large as any nonbuffer element in L. Thus E is in its correct position and does not need to be considered later when nonbuf- fer blocks are sorted by their tails. If C and D (or both) were empty, then although Es tail may be srnallerthan one or more nonbuffer elements, it is easy to see that E still need not be moved before the ma:in algo- rithm commences. Therefore, in any event, 13remains t&lock s-block s-block . . . s-block C A s-block s-block . . . s-block D B . IL * dsublist 1 sublist 2 a) Identifying list segments A, B, C, and D&-block s-block s-block . . . s-block buffer s-block s-block . . . s-block E b) Handling the rightmost MockF s-block s-block . . . s-block buffer G s-block.. . s-block E c) Identifying blocks F and G&-block s-block s-block . . s-block (s – Q-block W I s-block . . . s-block E +buffer 7d) Merging F and GH s-block s-block . . . s-block buffer I sLblock . . . s-block E e) Finishingthe leftmost blockH buffer s-btock , . . s-block ] I s-block . . . s-block E f) Ready for the main algorithmFIGURE4. Implementation Details 350Communications of the ACMMarch 1988 Volume 3:! Number 3where it is and its size can cause no difficulty for the merging which begins after these blocks are sorted.Next, we take care of the leftmost block, but only if it has an unusual size (that is, if t1 < s). Let F denote this block. Let G denote the first block of the second sublist. See Figure 42. We use the rightmost t, buffer elements to merge F and G, giving H and 1. See Figure 4d. By virtue of this merge of the smallest blocks in L, we can now move H to its final position by swapping it withthe t1 buffer elements there. This also restores the buffer. See Figure 4e.We now swap the buffer with 1, the first s-sized block of L, which is either the first block of L or the one following H. See Figure 4f. We are thus ready to per- form the main algorithm as previously described. (No- tice that blocks I, I and E might have to be moved to provide the two sorted sublists that we assumed at this point, for the ease of discussion, in Section 2. Naturally, we do not actually move them, since E is to remain where it is and since I and J are next sorted by their tails anyway.)Thus we have outlined an efficient method for hand- ling arbitrary list and sublist sizes and for extractingthe buffer. These implementation details are of time complexity O(A), thereby dominated by the linear time required for the main algorithm.4. ANALYSIS AND COMPUTATIONAL EXPERIENCE4.1 Constant of Proportionality EstimatesKey comparisons and record exchanges are generally regarded as by far the most time consuming merging operations used. Both require storage-to-storage instruc- tions for most architectures. Moreover, it is possible to count them independently from the code of any partic- ular implementation. Therefore, their total gives a use- ful estimate of the size of the linear-time constant of proportionality for the merge routine we have de- scribed.We now proceed to derive a worst-case upper bound on their sum. (We focus only on the main algorithm, since extracting the buffer and other algorithmic details described in Section 3 need only O(h) time. Similarly, the final sort of the buffer can be accomplished in-place with a heap sort in O(& log n) time.) Recall that s = L&J and that there are no more than s + 2 blocks. Using a selection sort to sort no more than s + 1 blocks of size s requires at most s(s + 1)/Z comparisons and at most s2 exchanges [7], summing to 1.5s + .5s I 1.5n + 5s. When merging series of elements, the number of comparisons is bounded above by the number of ex- changes. Exchanges occur only between buffer and nonbuffer elements; nonbuffer elements are movedonly once. Thus the comparison plus exchange total for the merging phase is at most 2(n – s).Hence, we are guaranteed a worst-case grand total of something less than 3%. This is a very reasonable quantity indeed, especially since any merge must, inthe worst case, use at least II – 1 key comparisons and n record moves. (Since L is not in a completely random form when the blocks are sorted, we can take advan- tage of its partial order to speed up this step, reducingour figure from 3.5~ down to 3.125~ We omit the de- tails from this presentation since they are cumbersome and not particularly interesting in their own right. Cu- rious readers are welcome to request additional infor- mation from the authors.)4.2 Observed BehaviorTo provide a more accurate estimate of the practical potential of our new merge procedure, we next de- scribe the results of a series of experiments aimed at comparing its average performance to that of a com- mon, heavily-used merge free to take advantage of un- bounded temporary extra storage. Suppose L contains two sublists, the first of length I,, the second of length I*. This fast merge works as follows. If [I 5 12,then it moves the first sublist to temporary storage and merges (forward) into Ls original position. Otherwise, it moves the second sublist to temporary storage and merges (backward) into Ls original position.Research ContributionsTABLE I.Listsize, n50 100 5001,0005.000 10,000 50,000100,000500,000 1,ooo,oooObserved Behavior of In-Place Strategy, A, Relative to Memory-DependenPt rocedure,B. AlgorithmAs AlgorithmBs AverageRun-Time, AverageRun-lime, AVG.,,in Milliseconds0.1991 0.3620 1.563 2.60613.4426.32 127.6 236.61152 2267AVGe,in Milliseconds0.0602 0.1205 0.6259 1.301 6.73413.3566.6 131.4 651.21309AVGA/AVG,3.307 3.004 2.529 2.157 1.996 1.972 1.913 1.617 1.769 1.747 There are several reasons for conducting an empirical comparison of these two merge algorithms. Optimisti- cally, we observe that, on the average, our mergeshould perform even better than the worst-case key comparison plus record exchange total derived in the last section would suggest. Pessimistically, however, we note that the lower-order (nonlinear) time complexity terms may be of significance, especially for small II. Furthermore, the space-squandering procedure can ac- complish the merge without true record exchanges,since it only needs to move records one at a time, not swap pairs of them, as it executes.We wrote both merge routines in Pascal for an IBM 3090-200, and timed them over large collections of lists, with relative sublist sizes and key values uniformly chosen at random. Table I summarizes the behavior we observed. Let A denote our in-place strategy. Let B de- note the fast procedure we have just described which requires O(n) extra space. Average run-times reported March 1988 Volume 31 Number 3Communications of the ACM 351Research Contributions(AVG,A and AVGs) reflect, for each list size n, the mean execution time each merge incurred during one hundred experiments. (These figures, of course, ac- count for the time spent merging only; they do not include the great amount of time devoted to overhead operations such as generating and sorting sublists.)Quite naturally, other factors such as record length, computer architecture, percentage of duplicates in a list and so on can affect algorithmic behavior. Therefore,the raltios we compute can only be viewed as estimates. We conclude that the expected penalty for merginglarge lists in place is less than a doubling of program execution times. This penalty can be insignificant, es- pecially in applications such as external file processing, in which overall run-times are dictated by the ways in which main memory can be used to assist in the effec- tive minimization and concurrency of I/O operations. We emphasize that we have implemented algorithm A just as we have described it herein, refraining from the temptation to fine tune it to make it faster. We sug-relatively small, high-speed memory components. Not only is there the immediate space savings, but the strat- egy we employ to pass the buffer across the list keeps active blocks near one another, thereby preserving lo- cality of reference and potentially reducing the likeli- hood of paging problems. (Of course, virtual memory behavior depends on several other factors as well, and can sometimes be surprising. See, for example, Alanko, Erkio and Haikala [l]).Finally, we note that other researchers are beginning to report some initial progress related to this general topic (see, for example, Dvorak and Durian [Z]). As block rearrangement strategies become better known, we believe further investigations will help to determine their practical potential for sorting, merging and related file-processing operations.Acknowledgements. The authors wish to thank Donald Knuth for suggesting this general line of investigation, and Michael Morford for his assistance in performing the run-time experiments whose results we have re- ported. We also thank the anonymous referees whose comments helped to improve the readability of this paper.REFERENCES1. Alanko. T.O., Erkio, H.H., and Haikala, I.J. Virtual memory behaviorof some sorting algorithms. IEEE Trans. on Software Engr. 10 (1984),422-431.2. Dvorak, S., and Durian. B. Towards an efficient merging. LectureNotes in Computer Science 233 (1986), 290-298.3. Horvath, E.C. Stable sorting in asymptotically optimal time andextra space. I. of the ACM 25 (197&I), 177-199.4. Hung, B.C., and Langston, M.A. Fast stable merging and sorting inconstant extra space, Computer Science Department Technical Re-port CS-87-170, Washington State University, 1987.5. Hung, B.C., and Langeton, M.A. Stable duplicate-key extractionwith optimal time and space bounds. Acta Informaticn. to appear. 6. Huang, B.C., and Langston, M.A. Stable set and multiset operationsin optimal time and space, Computer Science Department TechnicalReport CS-87-166, Washington State University, 1987.7. Knuth, D.E. The Art of Computer Programming, Vol. 3: Sorting andSearching. Addison-Wesley, Reading, Mass., 1973.6. Kronrod, M.A. An optimal ordering algorithm without a field ofoperation (in Russian), Dok. Aknd. Nauk SSSR186 (196!& 1X6-1258.9. Manila, H., and Ukkonen, E. A simple linear-time algorithm for inSitu ItErging. information Processing Letters 18 (1984), 2.03-208.10. Pardo, L.T. Stable sorting and merging with optimal space and timebounds. SIAM 1. Comput. 6 (1977), 351-372.CR Categories and Subject Descriptors: D.4.3 [Operating Systems]: File Systems Management-Maintenance; F.2.2 [Analysis Iof Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems- Sorting and Searching; H.2.4 [Database Management]: Systems-Tram- action ProcessingGeneral Terms: Algorithms, TheoryAdditional Key Words and Phrases: Block rearrangements, file pro- cessing, internal buffering. in-place algorithms, optimal time and space mergingReceived 8/86; revised S/87; accepted 6/87Authors Present Address: Department of Computer Science, Washing- ton State University, Pullman, WA 99164-1210Permission to copy without fee all or part of this material 1.sgranted provided that the copies are not made or distributed for direct commer- cial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise. or to republish. requires a fee and/or specific permission.gest that more streamlinedimplementations pace.can mergeat a considerablyaccelerated 5. REMARKSWe have presented a relatively straightforward and ef- ficient merging procedure which simultaneously optim- izes both time and space (to within a constant factor). For the sake of speed and simplicity, our algorithm is not stable. That is, records with equal keys are not assured of retaining their original relative order (refer back to Figure 3d). We have, however, been able to employ similar methods to devise a stable in-place merge for this problem [4]. Unfortunately, this algo- rithm is more complicated and runs somewhat slower. Even so, it is much faster than the fastest previously published stable in-place optimal time and space merge [lo]. VVe have also been able to develop a stable in- place O(n log n) sort [4], which bypasses the obvious merge-sort implementation, and is several times faster than any reported to date.This general line of research has the potential to change the way practitioners think about performing everyday file processing. For example, we have found that the merge routine we have presented here is straightforward enough to be implemented by students in an upper-divisional undergraduate computer algo- rithms class (at Washington State University) with only a preliminary version of this paper to use as a guide. In fact, we believe that this would be the case for our optimal time and space duplicate-key extraction, selec- tion, set and multiset operations as well [5, 61.We observe that obvious, traditional methods for merging in linear time require that n/2 additional memory cells be available. Even though internal stor- age is often rather cavalierly viewed as an inexpensive resource, it may well be that in many environmentsour algorithms modest run-time premium is more than offset by its avoidance of a 50 percent storage overhead penalty. In particular, this may be the case when man- aging c;ritical resources such as cache memory or other 352Communications of the ACMMarch 1988 Volume 31 Number 3
Reviews
There are no reviews yet.