COS60009 Data Management For The Big Data Age

The aim of this assignment is to develop skills to use XML and JSON to design and query semi-structured data, and to evaluate the difference between document- based NoSQL databases and SQL databases. It also requires students to research the availability of on-line tools, and to use these effectively.

Task 1: XML And DTD

Data structured according to the relational model is most obviously managed using a relational database management system, and maintained using SQL, which is used both for querying the data and for managing the evolution of the data. Such data can also be represented, textually, by using XML, which allows for the expression of both the structure of the data and of constraints in the data. The data for the management of projects undertaken by a company is described below, in relational terms.

The database is part of a company information system, and contains four tables; Department describing the departments of the company, Project describing the projects within the departments, Employee containing information about employees within the company, and Assign describing employees’ involvement in projects. The schema for each of the tables is shown below, with the primary key attributes underlined, the foreign key attributes in italic font, and some more information about each table.

Save your time - order a paper!

Get your paper written from scratch within the tight deadline. Our service is a reliable solution to all your troubles. Place an order on any task and we will take care of it. You won’t have to worry about the quality and deadlines

Order Paper Now
Department (DID, DNAME, LOCATION)
  The company has several departments identified by
  DID. Each department has a name (DNAME) and a
  location (LOCATION)
Project (PID, PNAME, BUDGET, DID)
  Each project has an identifier PID, a name (PNAME), a
  budget (BUDGET) and belongs to a department which
  is in the Department table.
Employee (EID, ENAME, OFFICE, BIRTHDATE, SALARY,
  DID)
  Each employee has an identifier EID, a name
  (ENAME), an office room (OFFICE), a birth date
  (BIRTHDATE), a salary (SALARY) and belongs to a
  department which is in the department table.
© Swinburne University of Technology 24/09/19

Assign (PID, EID, HOURS)

Each employee may be assigned to one or more projects that belong to the same department as the employee. Each project has one or more employees. The number of hours the employee has worked on the project is recorded.

The domains for each of the attributes are:

Task 2:

You are required to do the following.

2.1 Design a proper DTD called company.dtd to express the structure of the above relational database schema. The valid XML documents under this DTD must have a tree structure with as much nesting as possible. The DTD must also capture all the primary key and foreign key constraints.

2.2 Populate an XML document called company.xml with data, by referring company.dtd. It has at least 2 departments. Each department has at least 2 projects and 3 employees. Each project has at least 2 employees in the same department working on it, and each employee works for at least 1 project. You are required to use company.dtd to validate company.xml externally, using the Apache Xerces validate tool. The domains for all attributes described above should be used for the values of attributes. You should run the validator, and capture the output for submission.

Task 3: XPath And JSON

In an SQL database, the Structured Query Language is used to interrogate the data, allowing for the development of complex queries. One of the tools available to interrogate data presented in an XML format is XPath. (You might like to investigate the W3Schools Tutorial on XPath: https://www.w3schools.com/xml/xpath_intro.asp to strengthen your knowledge of XPath.)

In this question, you are required to write XPath expressions that will extract data from an XML database according to some queries. You should develop the queries, and, using an on-line XPath tool (which you should discover yourself via internet search), you should capture the results of applying the XPath expressions you write to the provided data. In your answer, you should identify the tool used (ie, provide the url), and you should include a capture of the output of running each query.

3.1 For the XML document units.xml shown in 3.3 below, which specifies details of units of study in a college, write the XPath expressions for the following queries.

  1. Find the titles of all units.
  2. Find the unitID of all units with “James” as an instructor.
  3. Find titles for those units with “yes” in attribute reqof credits.
  4. Find titles for those units with more than one instructor.

3.2 XML and JSON can both be used to represent structured data in text format. It is generally possible to translate between these formats using software tools. In this exercise you should use the XML document units.xml shown below to illustrate this by converting to JSON format.