[SOLVED] 程序代写代做代考 Excel database chain cache Microsoft PowerPoint – 16- Indexing-p2

30 $

File Name: 程序代写代做代考_Excel_database_chain_cache_Microsoft_PowerPoint_–_16-_Indexing-p2.zip
File Size: 866.64 KB

SKU: 8380044889 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Microsoft PowerPoint – 16- Indexing-p2

© 2018 A. Alawini & A. Parameswaran

Indexing:
B+ Tree &

Hash Tables
Doris Xin, Abdu Alawini

University of Illinois at Urbana-Champaign

CS411: Database Systems

October 24, 2018

1

© 2018 A. Alawini & A. Parameswaran

Announcements

• HW 3: Due by Friday 10/26 (23:59)
•Sign up for PT1 midterm demos: Due by Friday 10/26

(23:59)
https://wiki.illinois.edu/wiki/display/cs411sfa18/Project+Track+1
+Midterm+Demo+Signup

• Midterm review session: Friday 10/26 (4:00-4:50) SC 1404
• To suggest topicsto discuss in the review session, please fill this

form: https://goo.gl/forms/5fDcm80cDjmtMJ0H3

•Please fill the early course feedback form:
https://goo.gl/forms/SC4BYcrDy8dai8PE2

•Midterm: 10/29 in class 11-12:15 pm
2

© 2018 A. Alawini & A. Parameswaran

Today’s lecture

•Indexing
• Continue with B+ Trees
• Hash Tables

3

© 2018 A. Alawini & A. Parameswaran

What is the best part of the class so far?

4

© 2018 A. Alawini & A. Parameswaran

What is the least useful part of the
course so far?

5

© 2018 A. Alawini & A. Parameswaran

What would you change to make the
course even better?

6

© 2018 A. Alawini & A. Parameswaran

B+ Trees (Recap)

•Multi-level index on a specific search key
•Nodes contain keys and pointers

• Internal nodes: point to other nodes
• Leaf nodes: point to data records; last pointer to the next leaf node

7

20 60

10

10 15 18 20 30 50

15 18 20 30 50

© 2018 A. Alawini & A. Parameswaran

B+ Trees (Recap con’t)

•Each node can contain up to n keys;
• All nodes have the same capacity (n max keys)
• Degree d = n/2  the minimum # of keys per node

(assume n is even for simplicity)
• Each node has between [d, 2d] keys at all times
• Except root node, which can have only one key.

• In practice, each node has its own file block
• For a 4KB block, we can accommodate up to 340 keys (d=170).
• In practice, d = 100, 66.5% fill-in factor  133 keys
• Visiting one node = one disk read (latency ~ 105-107 ns)
• First 3 levels of can be cached in main memory (latency ~ 100 ns) to

reduce disk reads
8

an artificial
requirement to
make B+ Trees
“balanced”

© 2018 A. Alawini & A. Parameswaran

B+ Trees (Recap con’t)

•Do B+ Trees always help?
• No. e.g., an array of sorted integers.

•Types of queries to answer with a B+ Tree:
• Exact key value, e.g., SELECT name FROM people WHERE age=20
• Range queries, e.g., SELECT name FROM people WHERE age>=20

and age<=709© 2018 A. Alawini & A. Parameswaran 1 0id = 1, name = “Kyle”, age = 15id = 3, name = “Bob”, age = 20id = 10, name = “Ivan”, age = 22id = 12, name = “Diana”, age = 19id = 13, name = “Ed”, age = 20id = 15, name = “Lucy”, age = 19id = 27, name = “George”, age = 21id = 30, name = “Helen”, age = 23id = 33, name = “Cathy”, age = 18id = 35, name = “Jennifer”, age = 22id = 38, name = “Alice”, age = 21id = 70, name = “Fred”, age = 181310121315273033353870dense (entry for every record) d = 2n = 4B+ Tree search key = id© 2018 A. Alawini & A. Parameswaran 1 1id = 1, name = “Alice”, age = 15id = 3, name = “Bob”, age = 20id = 10, name = “Carl”, age = 22id = 12, name = “Diana”, age = 19id = 13, name = “Ed”, age = 20id = 15, name = “Fred”, age = 19id = 27, name = “Ginger”, age = 21id = 30, name = “Helen”, age = 23id = 33, name = “Ivan”, age = 18id = 35, name = “Jennifer”, age = 22id = 38, name = “Kyle”, age = 21id = 70, name = “Louise”, age = 181310121315273033353870123080828635dense (entry for every record) d = 2n = 4B+ Tree search key = id© 2018 A. Alawini & A. Parameswaran 1 2id = 1, name = “Alice”, age = 15id = 3, name = “Bob”, age = 20id = 10, name = “Carl”, age = 22id = 12, name = “Diana”, age = 19id = 13, name = “Ed”, age = 20id = 15, name = “Fred”, age = 19id = 27, name = “Ginger”, age = 21id = 30, name = “Helen”, age = 23id = 33, name = “Ivan”, age = 18id = 35, name = “Jennifer”, age = 22id = 38, name = “Kyle”, age = 21id = 70, name = “Louise”, age = 18151818191920202121222223dense (entry for every record) d = 2n = 4B+ Tree search key = age1921242522© 2018 A. Alawini & A. ParameswaranThink-Pair Share Exercise13Min capacity of a data entry page is 1 entry; Min capacity of an index page is 2 pointers/1 key value. Leaf nodes (data entry pages) point to data pages; each pointer represents a record ID (RID).Write the key value index entries for the root and the intermediate nodes in the tree above.© 2018 A. Alawini & A. ParameswaranHandling data changes in B+ Trees14© 2018 A. Alawini & A. Parameswaran 15Insertion in a B+ Tree8020 60 100 120 14010 15 18 20 30 40 50 60 65 80 85 9010 15 18 20 30 40 50 60 65 80 85 90Insert K=19Assume d=2.DATA BLOCKS© 2018 A. Alawini & A. ParameswaranInsertion in a B+ Tree168020 60 100 120 14010 15 18 19 20 30 40 50 60 65 80 85 9010 15 18 20 30 40 50 60 65 80 85 9019After insertion© 2018 A. Alawini & A. ParameswaranInsertion in a B+ Tree178020 60 100 120 14010 15 18 19 20 30 40 50 60 65 80 85 9010 15 18 20 30 40 50 60 65 80 85 9019Now insert 25© 2018 A. Alawini & A. ParameswaranInsertion in a B+ Tree188020 60 100 120 14010 15 18 19 20 25 30 40 50 60 65 80 85 9010 15 18 20 25 30 40 60 65 80 85 9019After insertion50© 2018 A. Alawini & A. ParameswaranInsertion in a B+ Tree198020 60 100 120 14010 15 18 19 20 25 30 40 50 60 65 80 85 9010 15 18 20 25 30 40 60 65 80 85 9019But now have to split !50© 2018 A. Alawini & A. ParameswaranInsertion in a B+ Tree208020 30 60 100 120 14010 15 18 19 20 25 60 65 80 85 9010 15 18 20 25 30 40 60 65 80 85 9019After the split5030 40 50© 2018 A. Alawini & A. ParameswaranDeletion from a B+ Tree218020 30 60 100 120 14010 15 18 19 20 25 60 65 80 85 9010 15 18 20 25 30 40 60 65 80 85 9019Delete 305030 40 50© 2018 A. Alawini & A. ParameswaranDeletion from a B+ Tree228020 30 60 100 120 14010 15 18 19 20 25 60 65 80 85 9010 15 18 20 25 40 60 65 80 85 9019After deleting 305040 50May change to 40, or not© 2018 A. Alawini & A. ParameswaranDeletion from a B+ Tree238020 30 60 100 120 14010 15 18 19 20 25 60 65 80 85 9010 15 18 20 25 40 60 65 80 85 9019Now delete 255040 50© 2018 A. Alawini & A. ParameswaranDeletion from a B+ Tree248020 30 60 100 120 14010 15 18 19 20 60 65 80 85 9010 15 18 20 40 60 65 80 85 9019After deleting 25Need to rebalanceRotate5040 50Rotation in general can involve either sibling, but here only the left sibling can help© 2018 A. Alawini & A. Parameswaran19 20Deletion from a B+ Tree258020 30 60 100 120 14010 15 18 60 65 80 85 9010 15 18 20 40 60 65 80 85 9019 5040 50© 2018 A. Alawini & A. ParameswaranDeletion from a B+ Tree268019 30 60 100 120 14010 15 18 19 20 60 65 80 85 9010 15 18 20 40 60 65 80 85 9019 5040 50© 2018 A. Alawini & A. ParameswaranDeletion from a B+ Tree278019 30 60 100 120 14010 15 18 19 20 60 65 80 85 9010 15 18 20 40 60 65 80 85 9019Now delete 405040 50© 2018 A. Alawini & A. ParameswaranDeletion from a B+ Tree288019 30 60 100 120 14010 15 18 19 20 60 65 80 85 9010 15 18 20 60 65 80 85 9019After deleting 40Rotation not possible5050Need to merge nodes© 2018 A. Alawini & A. ParameswaranDeletion from a B+ Tree298019 60 100 120 14010 15 18 19 20 50 60 65 80 85 9010 15 18 20 60 65 80 85 9019Final tree50© 2018 A. Alawini & A. ParameswaranAdvantages of B+Trees•Balanced  Uniform space utilization• Predictable organization• Predictable time (logarithmic); unbalanced can be linear in worst case •Good for range queries30Can we do better?© 2018 A. Alawini & A. ParameswaranOutline•IndexingB+ Trees• Hash Tables31© 2018 A. Alawini & A. Parameswaran32Hash Tables© 2018 A. Alawini & A. Parameswaran33Hash Tables•Secondary storage hash tables are much like main memory ones•Recall basics:•There are n buckets•A hash function f(k) maps a key k to {0, 1, …, n-1}•Store in bucket f(k) a pointer to record with key k•Secondary storage: bucket = block•Store in bucket f(k) any record with key k•use overflow blocks when needed© 2018 A. Alawini & A. Parameswaran34•Assume 1 bucket (block) stores 2 records•h(e)=0•h(b)=h(f)=1•h(g)=2•h(a)=h(c)=3Hash Table Exampleebfgac0123© 2018 A. Alawini & A. Parameswaran35•Search for a:•Compute h(a)=3•Read bucket (block) 3•1 disk accessSearching in a Hash Tableebfgac0123Main memory may have an array of pointers (to buckets) accessible by bucket number. © 2018 A. Alawini & A. Parameswaran36•Place in right bucket (block), if space•E.g. h(d)=2Insertion in Hash Tableebfgdac0123© 2018 A. Alawini & A. Parameswaran37•Create overflow block, if no space•E.g. h(k)=1•More over-flow blocksmay be neededInsertion in Hash Tableebfgdac0123k© 2018 A. Alawini & A. Parameswaran38Hash Table Performance•Fixed number of buckets•Excellent, if no overflow blocks•Degrades considerably when there are many overflow blocks. •Might need to go through a chain of overflow blocksCan fix this by allowing the number of buckets to grow© 2018 A. Alawini & A. Parameswaran39Extensible Hash Table• Array of pointers to blocks instead of array of blocks• Size of array is allowed to grow. 2x size when it grows• Don’t need a block per bucket. Sparse buckets share a block• Hash function returns k-bit integers (e.g., k=32)• Only use the first i << k bits to determine bucket• Number of buckets = 2i• Bit counter on each block indicates how much are used for that block0(010)1(011)i = 11101BUCKETSDATA BLOCKSBit counter© 2018 A. Alawini & A. Parameswaran40Insertion in Extensible Hash Table•Insert 1110 0(010)1(011)1(110)i=1 1101BUCKETS DATA BLOCKS© 2018 A. Alawini & A. Parameswaran41Insertion in Extensible Hash Table• Now insert 1010• Need to split block and extend bucket array• i becomes 2: done in two steps0(010)1(011)1(110), 1(010)i=1 1101BUCKETS DATA BLOCKS© 2018 A. Alawini & A. Parameswaran42Insertion in Extensible Hash TableStep 1: Extend the buckets0(010)1(011)1(110)i=1 1101BUCKETS DATA BLOCKS0(010)10(11)11(10)i=2 1100011011BUCKETS DATA BLOCKS© 2018 A. Alawini & A. Parameswaran43Insertion in Extensible Hash TableStep 2: Now try to insert 10100(010)10(11)10(10)i=2 120001101111(10) 2BUCKETS DATA BLOCKS© 2018 A. Alawini & A. Parameswaran44Insertion in Extensible Hash TableDone0(010)10(11)10(10)i=2 120001101111(10) 2BUCKETS DATA BLOCKS© 2018 A. Alawini & A. Parameswaran45Insertion in Extensible Hash Table• Now insert 0000: where would it go? Then 0101?• Need to split block, but not bucket array0(010)10(11)10(10)i=2 120001101111(10) 2BUCKETS DATA BLOCKS© 2018 A. Alawini & A. Parameswaran46Insertion in Extensible Hash Table• Now insert 0000: where would it go? Then 0101?• Need to split block, but not bucket array0(010)0(000), 0(101)10(11)10(10)i=2 120001101111(10) 2BUCKETS DATA BLOCKS© 2018 A. Alawini & A. Parameswaran47Insertion in Extensible Hash Table• Now insert 0000: where would it go? Then 0101?• Need to split block, but not bucket array10(11)10(10)i=220001101111(10) 2BUCKETS DATA BLOCKS00(10)00(00)201(01) 2© 2018 A. Alawini & A. Parameswaran48Performance: Extensible Hash Table• No overflow blocks: access always one read for distinct keys• BUT:• Extensions can be costly and disruptive• After an extension bucket table may no longer fit in memory• Imagine three records whose keys share the first 20 bits. These three records cannot be in same block (assume two records per block). But a block split would require setting i = 20, i.e., accommodating for 2^20 = 1 million buckets, even though there may be only a few hundred records. © 2018 A. Alawini & A. Parameswaran49Linear Hash Table• Idea 1: add only one bucket at a timeProblem: n = no longer a power of 2• Let i be # bits necessary to address n buckets.• i = ceil(log2 n)• After computing h(k), use last i bits:• If last i bits represent a number >= n, change msb from 1 to 0 (get a number

< n)• Idea 2: allow overflow blocks (not expensive to overflow)• Convention: Read from the right (as opposed to the left)© 2018 A. Alawini & A. Parameswaran50Linear Hash Table Example• N=3 <= 22 = 4• Therefore, only buckets until 10• When inserting 0111, 11 is flipped => 01

(01)00

(10)10

i = 2

n = 3

r = 4

00
01
10

(01)01

(01)11 BIT FLIP

© 2018 A. Alawini & A. Parameswaran
51

Linear Hash Table Example

• Insert 1001:

(01)00

(10)10

00
01
10

(01)01

(01)11

i = 2

n = 3

r = 4

© 2018 A. Alawini & A. Parameswaran
52

Linear Hash Table Example

• Insert 1001: overflow blocks…

(01)00

(10)10

00
01
10

(01)01

(10)01

(01)11

i = 2

n = 3

r = 5

© 2018 A. Alawini & A. Parameswaran
53

Linear Hash Tables

• Extend n  n+1 when average number of records per bucket exceeds
(say) 80% of total number of records per block

• e.g., r/n <= 0.8 * 2 = 1.6 (for block size = 2)• Until then, use overflow blocks (cheaper than adding buckets)r/n = 5/3 = 1.67 > 1.6
Time to add a bucket

(01)00

(10)10

00
01
10

(01)01

(10)01

(01)11

i = 2

n = 3

r = 5

© 2018 A. Alawini & A. Parameswaran
54

Linear Hash Table Extension

• From n=3 to n=4

• Only need to touch
one block (which one ?)

(01)00

(10)10

00
01
10

(01)01

(10)01

(01)11

00
01
10
11

i = 2

n = 3

r = 5

i = 2

n = 4

r = 5

(01)11

(01)00

(10)10

(01)01

(10)01

(01)11

© 2018 A. Alawini & A. Parameswaran
55

Linear Hash Table Extension

• From n=3 to n=4 finished

(01)01

(10)01

(01)11

00
01
10 (10)10

(01)00

11

i = 2

n = 4

r = 5

r/n = 5/4 = 1.25 < 1.6✔© 2018 A. Alawini & A. Parameswaran56Summary• B+ Trees (search, insertion, deletion)• Good for point and range queries• Log time lookup, insertion and deletion because of balanced tree• Hash Tables (search, insertion)• Static hash tables: one I/O lookup, unless long chain of overflow• Extensible hash tables: one I/O lookup, extension can take long• Linear hash tables: ~ one I/O lookup, cheaper extension•No panacea; dependent on data and use case© 2018 A. Alawini & A. Parameswaran57Index 2.0• Learn the best index from the data and queries logs• Machine Learning to the rescue!• Recall, an index is a function• Machine learning are good at learning functions from data• What’s your cool idea for a better index?

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] 程序代写代做代考 Excel database chain cache Microsoft PowerPoint – 16- Indexing-p2
30 $