[SOLVED] C++ C data structure Java SQL XML database graph README first:

$25

File Name: C++_C_data_structure_Java_SQL_XML_database_graph_README_first:.zip
File Size: 584.04 KB

5/5 - (1 vote)

README first:
This is anindividualprogramming assignment, to be completed under theConsultationmodel of collaboration as per the Computing ScienceDepartment Course Policies.
You must not uploadbinaryfiles or large text files (e.g., XML files out of Osmosis) of any kind to your GitHub repository.
Your solution will be graded and must run on any of the machines in the instructional labs, without the installation of libraries or tools not already there.
Submission
Your answersmust be on a GitHub repository created by following the link on eClass. That repository already has the folder structure for the assignment and the configuration file for Travis-CI.DO NOT MODIFYthe directory structure nor the.travis.ymlfile.
You must also submit the URL of your GitHub repository on eClassbefore the deadline.
Learning Objectives
Understanding theOpenStreetMap(OSM) data model.
Extracting OSM data from a binary map file into an XML representation.
Writing one or more programs to parse the XML representation and extract data of interest.
Further practicing your knowledge of SQL.
Becoming acquainted with embedded SQL programming with SQLite3 and its C/C++ programming API.
Part I Loading OSM data into SQLite
TheOSM data modelhas four elements:
Anoderepresents a unique point on the Earths surface; it has anidand a pair oflat/loncoordinates. A node can be used to represent a park bench, a water fountain, a phone booth, etc.
Awayis an ordered list of nodes, defining a path (calledpolylinein the OSM documentation) on the map.
A path that starts and ends with the same node is said to beclosedand represents anareaon the map (e.g., a lake, a farm, a building, ).
Arelationis a multi-purpose data structure that groups together several elements (e.g., multiple nodes and ways).
Atagis a key-value pair, used to describe an element. OSM has apredefined list of accepted tags, but users can add their own.
Theminimum bounding rectangle (MBR)of a setSofnodeelements is the smallest rectangle such that no element ofSfalls outside that rectangle.
Q1 (2 marks)
In a sentence, your task is to write and document scripts and/or programs, in any language (as long as they can run on the lab machines) to create a SQLite database with all nodes, paths and areas (closed paths) within the City of Edmonton, together with any tags associated with them, from an OSM map of Alberta.
You need to use theOsmosiscommand-line Java tool to process the binary OSM data file and produceXMLfiles which are both human and machine readable. You will use map data encompassing the city of Edmonton. For this, you can download a recent map of Alberta from theCanadian section of the Geofabrik.de download site. You must extractallnodeandallwayelements within the MBR containing all of the City of Edmonton. To find the coordinates of such MBR, usethe official boundaries of the City of Edmontondescribed by approximately 8 thousand points (lat/lon coordinates).
Extracting Map Data With Osmosis
Download and install Osmosis in your laptop or your home directory in a lab machine. Run Osmosis to extract an XML dataset for the city of Edmonton, providing Osmosis the MBR of the city as a command line parameter. Call your output fileedmonton.osm(note that an XML file need not havexmlas extension).
Readthisto familiarize yourself with the command line parameters. You want to use the following ones (all inone lineand in the order below):
read-pbf bounding-box bottom= left= top= right=write-xml edmonton.osm
Nodes
The XML file you created should have several thousand elements like this one:

Such nodes should be stored in a table
node (id integer, lat float, lon float)
Some of thenodeelements in the XML file span multiple lines and havetagelements nested within them (which you need to capture as wellsee below):

NOTE:in XML terminology, thesingle-linenodeelements areemptyand hence can be closed succinctly in the same line (with/>). Themulti-lineones, instead, are not empty and need to be closed explicitly withe a
tag.
Paths and Areas
Paths and areas are encoded aswayelements. For example,Athabasca Halllooks like this:

Each element refers to a node in the XML file identified by therefattribute. Also, these elements are logicallyorderedin the same way they appear in the file. Paths (open or closed) must be represented in the database inside two tables:
way (id integer, closed boolean)waypoint (wayid integer, ordinal integer, nodeid integer)
Tags
Finally,tagelements should be stored in two separate tables:
nodetag (id integer, k text, v text)waytag(id integer, k text, v text)
Constraints
Your databasemustenforce the following constraints:
Primary keys:node(id)andway(id)
Foreign Keys:
waypoint(wayid)is a foreign key referencingway(id)
waypoint(nodeid)is a foreign key referencingnode(id)
nodetag(id)is a foreign key referencingnode(id)
waytag(id)is a foreign key referencingway(id).
Other constraints
closedinwayshould be trueif and only ifthe path is closed.
There can be no loops (i.e, the same node cannot appear more than once) in a way for whichclosed = false.
Theordinalvalues inwaypointform adense ordering; that is: (1) each node is assigneduniqueordinal between 1 and the number of nodes elements in the path. The ordinals should initially correspond to the order the nodes appear in the XML file.
Your solution must recompute the ordering of the nodes and the value of theclosedattribute after each valid update, as appropriate, so that the constraints are not violated..
Loading the XML Data into a SQLite Database
The final product of this part of the assignment is a database with the schema described above. Below we give asuggestionof how to do that. You are free to choose a different approach, and of course, you can discuss it with your TA beforehand. Regardless of the design you choose, you are free to either write programs that produce CSV or TSV files as output and load them by hand into SQLiteoryou can write programs that connect to SQLite and populate the tables with Embedded SQL.
NOTE:due to memory restrictions, you areNOTallowed to load the entire XML file in memory using DOM. Instead, we suggest youscanthe file element by element.
The proper way to do this is using the Simple API for XML (SAX) for this purpose. However, the XML file you are dealing with here is so simple that a less elegant alternative can work, which is to parse the file line by and decide what to do after each line. You should never do this for arbitrary XML files where you do not know a priori how many levels of nesting are used.
Naming convention
The test scripts for the other questions assume that there will be a SQLite database on a file callededmonton.dbinside theq1folder. If you do not follow this convention, you need to change the test scripts in every question.
What will be graded?
Your TA will grade theREADME.mdfile inside theq1folder, and the.gitignorefile in your repository.README.mdshould have clear instructions to build the database, while.gitignoremust have the path to all XML or SQLite files mentioned in your solution.
Rubric for Q1
Percentage
Description
100
TA is able to build the SQLite database with the instructions provided, **on a lab machine**, without errors or warnings, starting from a binary OSM file (e.g., the Alberta map); instructions work both on an OSM file created by the TA as well as on the one created by the students; the solution handles escaped characters properly; database has **all constraints properly defined**; all data is extracted as specified.
75
Solution works with instructions provided but only on the OSM file created by following the instructions; OR TA must fix/guess steps from the instructions provided; database has at least one trigger-based constraint properly defined; OR all trigger-based constraints are partially defined.
50
Instructions provided are insufficient; TA must guess/fix several steps; OR not all data is extracted and stored as specified; OR not all data is extracted as specified; OR only the PK/FK and CHECK constraints are implemented.

25
TA is able to extract and load at least two kinds of data (e.g., nodes and tags but not ways) from your instructions; OR only the primary key and foreign key constraints are implemented.

0
Repository contains any binary file or any temporary XML file(s) created during testing; OR the solution is in a repository created by the student; OR the XML processing is done using DOM; OR the solution requires libraries or tools not in the instructional lab machines.
Part II Embedded SQL
Required reading:
ReadallofAn Introduction To The SQLite C/C++ Interfacebefore writing any code.
Further to that, read the following carefully before you write any code:
Database Connection Handle
Prepared Statement Object, paying especial attention to the part explaininglife-cycle of a prepared statement object
Result Values From a Query
Binding Values To Prepared Statements
Create or Redefine SQL Functions
Specifications
For each question, you are asked to write a C program thatfollows the life-cycle describedhereto accomplish the following tasks. (If you want to use C++ talk to the instructor.) Your programs should work on any SQLite database with the schema as inPart I**. Your TA will test your code on multiple databases conforming to that schema.
Input/Output
All inputto the programs must be done viacommand line argumentsin the order specified in the question, or in text files. A tag given as command line arguments will always be asingle stringin the formkey=value(i.e., a single entry in the argument list). For example, a tag could bewheelchair_accessible=yes.
The output of the program must be as specified and match the unit test code provided.
Q2 (1 mark)
Write a C program, in a file calledq2/src/solution.cthat takes as input the database file and twonodeidentifiers and prints toSTDOUTtheirgeographical distancein meters, computed by a suitable function fromhereand links therein.
The distance must be computed by a new user-defined SQL function, called froma single queryin your code. In yourREADME.mdfile for this assignment, with an explanation of why the function you chose is suitable for computing distances within the city of Edmonton. Your explanation will count for 1/2 mark towards your grade in this question.
Output:your program must print toSTDOUTthe geographical distance (i.e., a single number) or the worderror(in a single line) if the parameters are incorrect (e.g., the database file is missing or the number of node ids is incorrect, or some id is missing from the database).
Q3 (0.5 mark)
Write a C program, in a file calledq3/src/solution.cthat takes as input the database file and alistof strings of the formkey=value; finds everynodein the database having atleast onetag matching a key/value combination from the input list; and prints toSTDOUT: the number of such node elements, as well as the largest pairwise distance among those nodes.
The maximum distance must be computedby a single SQL querythat uses the function you created for Q2.
Output:your program must print toSTDOUTtwo numbers separated by a space or a tab, or the worderror(in a single line) if the parameters are incorrect.
Q4 (0.5 mark)
Write a C program, in a file calledq4/src/solution.cthat takes as input the database file and alistof strings of the formkey=value, and finds everywayin the database having at least one tag matching a key/value combination from the input list; and prints toSTDOUT: the number of such paths, and the length of the longest such path, computedby a single SQL queryas in Q3.
All lengths must be computed in SQL, as in Q2/Q3.
Output:your program must print toSTDOUTtwo numbers separated by a space or a tab or the worderror(in a single line) if the parameters are incorrect.
Q5 (0.5 mark)
Write a C program, in a file calledq5/src/solution.cthat takes as input a database file and atsvfilecontaining zero or more lines, each describing anodeto be inserted into the database.
In that file, each node must be described byat leastthree columns: (0) the node id, (1) the latitude, (2) the longitude. Subsequent columns in the file correspond to strings of the formkey=valueproviding tags for the node, and must be inserted accordingly.
Output:your program must print toSTDOUTthe wordsuccesson a single line if the execution was successful orerroron a single line followed by the SQLite error message on subsequent lines, in case the output is invalid or some constraint is violated.
All or nothing: your program must leave the database unchanged if there is an error on any of the nodes described in the input file.
Q6 (0.5 mark)
Write a C programq6/src/solution.cthat takes as input a database file and atsvfile containing zero or more lines describing zero or morewayelements, which are to be inserted into the database. The format of the input TSV is as follows:
Eachwayelement is described by two consecutive non-blank lines (i.e., blank lines are used to separateways)
The first such line has the id of thewayin column 0, followed by zero or more strings of the formkey=valuein subsequent columns, with tags for the way.
The second line has all the nodes (identifiers) in the way, with the column number corresponding to the order of the node.
Output:your program must print toSTDOUTthe wordsuccesson a single line if the execution was successful orerroron a single line followed by the SQLite error message on subsequent lines, in case the output is invalid or some constraint is violated.
All or nothing: your program must leave the database unchanged if there is an error on any of the ways or any of the nodes described in the input file.
Rubric for all other questions
Percentage
Description
100
The code is correct, readable, and well documented, so that the TA can clearly understand what it does. Also, the TA is able to compile and execute the code on all test cases without errors or warnings, including test cases with invalid input, by following the instructions provided.
75
The code is correct and is understandable with some effort. The TA is able to compile and execute the code on all test cases, including test cases with invalid input, by guessing or fixing steps.
50
The code works only on (all) test cases with valid input; OR the TA is able to execute the code from the instructions but there are parts of the code that are not understandable with reasonable effort; OR the code does not follow the life-cycle mentioned in the Tasks section above.
25
The code works on one or two test cases with valid input; OR the code is able to read all inputs but does not compute the correct value nor modifies the database as required; OR the code is incomprehensible.
0
**Travis-CI raises errors or warnings for the question**; OR, the provided Travis-CI configuration file has been modified; OR the provided folder structure has been modified; OR the code requires libraries or tools not in the instructional lab machines.

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] C++ C data structure Java SQL XML database graph README first:
$25