MONASH
INFORMATION TECHNOLOGY
Where To?
Exam Preparation
FIT2094-FIT3171
Operational Database the units focus
2
Application Development
Web based front ends
Wide range of approaches: PHP, ASP.NET, etc
Very Rudimentary (requires VPN)
PL/SQL
backend development
Triggers, functions, procedures and packages
Procedure to change employee departments: move_employee
(empno, new dept)
3
FIT2104 Web database interface
4
Usage of database
Example of a supermarket Decision making
Operational level
How often do we need to re-stock X-item?
Strategic and tactical level
Is there any branch that performs worse than the state average?
What is the total sales made by each state each year and across a number of years?
5
Operational Data vs. Decision Support Data
Operational data
Mostly stored in relational database
Optimized to support transactions representing daily operations
Example:
How many students enrolled in FIT2094?
Decision support data differs from operational data in three main areas:
Time span
Granularity
Dimensionality
Example:
What is the total number of students in the foundation units in each year (subtotal of the two semesters numbers) and the total across years, across a single unit.
6
7
8
Decision Support Database Requirements
Specialized DBMS tailored to provide fast answers to complex queries Three main requirements
Database schema
Data extraction and loading
Database size
Database schema
Complex data representations
Aggregated and summarized data
Queries extract multidimensional time slices
Data extraction and filtering
Supports different data sources
Flat files
Hierarchical, network, and relational databases
Multiple vendors
Checking for inconsistent data
9
The Data Warehouse
Database size
In 2013, eBay had around 90 Petabytes of data in its
data warehouses (90,000 Terabytes)
DBMS must support very large databases (VLDBs)
Integrated, subject-oriented, time-variant, and nonvolatile collection of data
Provides support for decision making
Usually a read-only database optimized for data analysis and query processing
Requires time, money, and considerable managerial effort to create
10
11
FIT3003 Business intelligence and data warehousing
12
IOT the explosion Data, Data, Data ..
13
Data Growth
Source: https://www.domo.com/learn/data-never-sleeps-7#/
14
Data Growth
Source: https://www.seagate.com/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
15
Big Data Characteristics
Volume
The quantity of data to be stored
Velocity
The speed at which data enters the system and must be processed
Variety
Variations in the structure of the data to be stored
16
Big Data Characteristics: Volume
Scaling up: keeping the same number of systems but migrating each one to a larger system
Scaling out: when the workload exceeds server capacity, it is spread out across a number of servers
17
Scaling
How do we scale current relational systems? SQL designed for database as a single physical entity
Purchase bigger boxes: costly and has real limits
Increase the number of processors, yielding parallel computation/database with complex issues to handle
Distribute database challenges to maintain ACID transaction principles and issues of availability/consistency
The rise of OO programming in the 80s also highlighted a problem known as the Impedance Mismatch
The program treats items as objects, but they need to be mapped to relational tables (de aggregating the object)
Also issues about private vs public (relational about need, OO absolute characteristic of data)
18
Scaling continued
Big players, notably Google and Amazon chose a different path
Lots and lots of smaller boxes (commodity servers)
Non relational structure
Google: Bigtable
https://research.google/pubs/pub27898/
https://cloud.google.com/bigtable/docs/overview
Used for wide range of apps Gmail, Google Earth, YouTube
Amazon: Dyanmo
http://www.read.seas.harvard.edu/~kohler/class/cs239-w08/decandia07dynamo.pdf
Based on Dynamo: https://aws.amazon.com/dynamodb/
19
Scaling continued
Term NoSQL coined by John Oskarsson in 2009 after calling a free meetup about open source, distributed, non relational databases or NOSQL for short
http://blog.oskarsson.nu/post/22996139456/nosql-meetup
Characteristics
Non relational,
mostly open source,
distributed (cluster friendly),
schema-less (no fixed storage schema)
20
Big Data Characteristics: Velocity
Stream processing: focuses on input processing and requires analysis of data stream as it enters the system
CERN Large Hadron Collider 600TB per second 1 GB per second Feedback loop processing: analysis of data to produce
actionable results
21
Fast Data Processing
Computer systems
Parallel computer: A single machine with massive number of CPUs.
Cluster of computers: Multiple machines connected via network; Commodity computer.
Database structure
Non-relational database (NoSQL)
No update, append only. Optimised for a main operation
Examples: MongoDB, Cassandra Distributed File Systems
HDFS (Hadoop File Systems) / Parquee File Systems Parallel data processing
Hadoop / Spark In Memory database
22
Big Data Characteristics: Variety
Structured data: fits into a predefined data model
Relational databases
Incoming data decomposed under normalisation rules to fit the data model
Unstructured data: does not fit into a predefined model
Big Data requires that the data is captured in its natural format as generated without imposing a data model on it
Semi structured data: combines elements of both
23
FIT3176 Advanced database design
24
Data Processing Ecosystem
http://www.clearpeaks.com/blog/big-data/big-data-ecosystem-spark-and-tableau
25
Horses for Courses
Conventional RDBMS will continue play an important and significant role in OLTP (Online Transactions Processing)
Increasingly now a range of database products are available, need to select appropriate product/model for task at hand.
26
FIT2094-FIT3171 Exam
27
2021 Semester Summer B Exam Format
Timed: 2 hours 10 minutes (reading time included)
e-exam platform: https://eassessment.monash.edu/
Close book (no cheat sheets), e-invigilated
Learn more here:
https://www.monash.edu/exams/electronic-exams/about
read:
Supervised eExams using Monash eVigilation
eExams requiring handwritten answers
For the modelling question makes use of a hybrid question
write answer on paper (mark answered on e-exam)
photograph with phone
upload via QR code (after exam has been completed) important that
you practice this process
and Try out a general knowledge practice exam
Note that the exam is a time-pressured test
manage your time wisely
28
2021 Summer B Exam Format
100 marks 50% of your final mark in FIT2094-FIT3171. Minimum to pass the unit overall:
40% in-semester, 40% exam and 50% overall
Assignment 2 marking will not be finalised before the exam
Questions:
6 parts cover theory and application
Timing is crucial 100 marks, 120 mins 1 mark/minute target
Part D SQL Case Study will be released 9am the day before the exam day.
No tables/data provided on FITUGDB, no access to SQL Developer, LucidChart and other softwares
Exam when?
your responsibility
29
2021 SSB Mock Exam
Link is provided on Exam tab Moodle, self enrolled.
timed (2 hrs 10 mins), unlimited attempts
do not open or attempt this mock exam until such time as you are able to make a full 2 hrs and 10 mins attempt, rather than open and browse the paper.
Available from Thursday 11th February at 12PM, sample solution will be available Monday 15th February from 3 PM
Serves to provide a general overview of the general structure of the exam only.
To protect the integrity of the exam: NO ACTUAL EXAM QUESTIONS are included; and the COMPOSITION OF THE QUESTIONS IN EACH PART are SUBJECT TO CHANGE.
All content specified by the Unit Guide is examinable, including but not limited to
Pre-reading (weekly Coronel & Morris chapters)
Workshop Slides and Videos
Tutorial Notes, and
all other Moodle Materials (except where explicitly stated NOT EXAMINABLE).
30
Workshop Session 2 and 5 Data Modelling INCLUDING BUT NOT LIMITED TO THESE TOPICS
Conceptual vs Logical Level
Entity
Strong vs weak
Associative entity
Types of attributes
Relationship
Type : one-to-one, one-to-many, many-to-many
Cardinality and Participation Identifying vs Non-identifying.
Mapping from Conceptual to Logical E.g. Mapping many-to-many
FIT3171 UML
31
Workshop Session 3 Relational Model INCLUDING BUT NOT LIMITED TO THESE TOPICS
Relational model properties.
Keys
Superkey, Candidate Key, Primary Key Foreign Key
Data Integrity Entity integrity Referential Integrity
Relational Algebra Understanding of efficiency
32
Workshop Session 4 Normalisation INCLUDING BUT NOT LIMITED TO THESE TOPICS
UNF to 3 NF Mapping form to UNF
UNF to 1 NF remove repeating group. 1NF to 2 NF remove partial dependency. 2NF to 3NF remove transitive dependency.
Dependency diagrams
Use the general definition
Partial in 1NF, Transitive in 2NF, Full in 3NF
use this notation: cust_id cust_name, cust_address
Be careful in choosing the PK!
Mapping a set of 3NF relations to a logical model
33
Workshop Sessions 6 and 8 DDL and DML
INCLUDING BUT NOT LIMITED TO THESE TOPICS
DDL
CREATE TABLE statements
Primary key definition Foreign key definition Other Constraints
ALTER
DROP DML
INSERT
Adherence to referential integrity constraints and the order of insertion Oracle Sequence
UPDATE (DML) DELETE (DML) COMMIT
34
Workshop Session 7, 9 and 10 SQL INCLUDING BUT NOT LIMITED TO THESE TOPICS
Single table retrieval with predicate
Join
Natural join
Outer join Aggregate functions Set Operators Subquery
Oracle functions
TO_CHAR, TO_DATE, NVL, UPPER, LOWER, ROUND
35
Workshop Session 9 PLSQL and Workshop Session 11 DB Connectivity, Web Technology
INCLUDING BUT NOT LIMITED TO THESE TOPICS
Web database connectivity
Understanding of the principles and ALL core concepts:
Database middleware
Web to database middleware
Using PHP to communicate with databases
must understand php code which relates to database Database design frameworks
modern frameworks
ORM
Security SQL Injection
FIT3171 TRIGGER
36
Workshop Session 8 Transaction Management INCLUDING BUT NOT LIMITED TO THESE TOPICS
Transaction.
ACID properties.
Transaction problems.
Transaction management with locks.
Wait For Graphs
Restart and Recovery using Transaction Log.
37
Workshop Session 12
The content of Session 12s workshop Database Trends
Future directions
Is NOT examinable (questions relate to this sessions new content will not appear on the exam)
38
Consultations for Final Exam
From Tuesday 16th February 2021, one online consultation session per day will be provided.
Details posted on Moodle
Please dont come to consultations in a hope to squeeze some useful information about final exam
Session intended to clear up any issues YOU find as you prepare for the exam
39
40
Reviews
There are no reviews yet.