[SOLVED] CSI 3030 Project Guidelines

$25

File Name: CSI_3030_Project_Guidelines.zip
File Size: 254.34 KB

5/5 - (1 vote)

CSI 3030 Project Guidelines

Objective
Implement a new symmetric hash join query operator replacing the current hash join implementation.
Modifications to be done in Optimizer and Executor component

What you need to know
Understand what a hash join is.
How is it implemented in POSTGRESQL.
Need to know what all files to be modified.
Understand POSTGRESQL backend architecture.

What you are to implement Symmetric Hash Join

A Tour of PostgreSQL Internals https://www.postgresql.org/files/developer/tour.pdf

More reading to do
Understand how Optimizer and Executor works
Refer to src/backend/optimizer/README Refer to src/backend/executor/README

Relevant Files Src/backend/executor
nodeHashJoin.c: This file implements the actual processing of the hash join operator.
nodeHash.c: This file is responsible for creating and maintaining a hash table.
*You will be given a list of methods to be modified for this Project (slide 19 onwards ).

Relevant Files src/backend/optimizer/plan/
createplan.c: This file contains the code that creates a hash join node in the query plan.
src/include/nodes/
execnodes.h: This file contains the structure
HashJoinState that maintains the state of the hash join during execution.

Deliverables
All of the relevant files listed in the two previous slides to zipped and submitted
Send the zipped file to any one of the TAs emails. Copying the other two lab TAs (emails available here https://github.com/ferna11i/3130-DBMS)
Insert comments in the parts you have modified. Also add in any necessary explanations.
Need a test query which would execute your code as per the requirement. Test Query is available here (schema.txt, query.txt, output.txt) (https://github.com/ferna11i/3130- DBMS/tree/master/Project)
All comments in code to be preceded by CSI3130:
Project Deadline: Monday December 9 at 6:00 pm at the latest.

Additional Help
http://doxygen.postgresql.org/: Source code browser
ELOG in POSTGRESQL

Hash Join & Symmetric Hash Join explained

What is a JOIN Between tables or self
Query accessing multiple rows of the same or different tables
Order of execution : Sequential

Hash Join
Loads candidate records from one side of the join into a hash table.
Probe for each record from other side of the join
Purpose of the Hash Table Indexing

Hash Join

Facts about hash joins
Hash joins do not need indexes on predicates
Reduce the hash table size to improve the performance
Cannot perform joins that have range conditions in the join predicates

Drawbacks with Hash Join Bottleneck in query execution pipeline
Sufficient memory required to store inner relation.
Hybrid Hash Join!!!

Symmetric Hash Join
Maintains two hash tables Two Hash functions

Other requirements to be considered
Should run until it gives one output tuple each time
State should be saved should include detail on which tuple is running using state node.

Further Project Details

Disabling use of Multiple Batches
Present implementation
Hybrid hash join
Tuples from inner / outer are considered in batches

Changes in nodeHash.c
Modify ExecHash
Disable Batches
Implement ExecScanHashBucket for both Inner and Outer

ExecHash
Remove the present error message
It should have the same code as MultiExecHash

Where to disable?
nodeHash.c
ExecHashTableCreate

Changes to scanhashbucket Present implementation
Two functions : one for probing inner and other for outer
Returns HeapTuple

Changes to scanhashbucket
Need only one method for scannning both hash tables
Return HashTuple instead of HeapTuple
Check for type of hash table in the method : whether its inner or outer.

Sample Implementation Code

Deliverables
All of the relevant files listed to be submitted
Send in the Zipped file to Any of the TAs emails. Copying the other TAs (emails available here https://github.com/ferna11i/3130-DBMS)
Insert comments in the parts you have modified. Also add in any necessary explanations.
Need a test query which would execute your code as per the requirement. Test Query is available here (schema.txt, query.txt, output.txt) (https://github.com/ferna11i/3130- DBMS/tree/master/Project)
All comments in code to be preceded by CSI3130:
Project Deadline: Monday December 9 at 6:00 pm at the latest.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CSI 3030 Project Guidelines
$25