Assignment title: Information
Programming Assignment 3
Requirement: In this assignment, you are required to parse, store, manage, and analyze the
service data retrieved from ProgrammableWeb. The service data and the required tasks are
described as follows:
1. Service data.
The service data was retrieved through ProgrammableWeb APIs (http://apiportal.anypoint.mulesoft.com/programmable-web/api/programmable-web-api) in summer 2014.
The service data is stored in several .txt files: api.txt, mashup.txt, and members.txt:
Ø API data: The information of Web APIs is stored in api.txt. In the file, each line
corresponds an API; the fields are separated by delimiter $#$, the multiple values of a
field are separated by ###. The format of each API record is:
id$#$title$#$summary$#$rating$#$name$#$label$#$author$#$description$#$ty
pe$#$downloads$#$useCount$#$sampleUrl$#$downloadUrl$#$dateModified$#
$remoteFeed$#$numComments$#$commentsUrl$#$tag1###tag2###tag3$#$cate
gory$#$protocols$#$serviceEndpoint$#$version$#$wsdl$#$dataFormats$#$apig
roups$#$example$#$clientInstall$#$authentication$#$ssl$#$readonly$#$Vendor
ApiKits$#$CommunityApiKits$#$blog$#$forum$#$support$#$accountReq$#$c
ommercial$#$provider$#$managedBy$#$nonCommercial$#$dataLicensing$#$f
ees$#$limits$#$terms$#$company$#$updated
For example, the first API record in the file is:
http://www.programmableweb.com/api/the-global-proteome-machine$#$ The Global
Proteome Machine$#$Proteome data for biomedical research$#$4.4$#$ The Global
Proteome Machine$#$ The Global Proteome Machine$#$$#$The Global Proteome
Machine is an attempt to create knowledge from proteomics data and reuse it to solve
biomedical research problems. The Global Proteome Machine Database was built to
use GPM data to help validate peptide MS/MS spectra and protein coverage patterns.
The Global Proteome Machine Database API provides RESTful access to commonly
required information based on data from the GPM Database. Responses are JSON
formatted. $#$1$#$$#$$#$http://wiki.thegpm.org/wiki/GPMDB_REST$#$$#$2012-
12-17T09:51:40Z$#$$#$$#$http://api.programmableweb.com/apis/the-globalproteomemachine/comments$#$database###science$#$Science$#$REST$#$http://gpm
db.thegpm.org/$#$$#$$#$JSON$#$$#$$#$$#$$#$$#$$#$$#$$#$$#$$#$$#$No$#$$#
$http://www.thegpm.org/$#$$#$$#$$#$$#$$#$$#$$#$2012-12-17T09:51:40Z
Therefore, we have the following information:
id http://www.programmableweb.com/api/the-global-proteomemachine
title The Global Proteome Machine
summary Proteome data for biomedical research
rating 4.4
name The Global Proteome Machine
label The Global Proteome Machine
author (null)description The Global Proteome Machine is an attempt to create
knowledge from proteomics data and reuse it to solve
biomedical research problems. The Global Proteome Machine
Database was built to use GPM data to help validate peptide
MS/MS spectra and protein coverage patterns. The Global
Proteome Machine Database API provides RESTful access to
commonly required information based on data from the GPM
Database. Responses are JSON formatted.
type 1
downloads (null)
useCount (null)
sampleUrl http://wiki.thegpm.org/wiki/GPMDB_REST$#$$#$2012-12-
17T09:51:40Z
downloadUrl (null)
dateModified 2012-12-17T09:51:40Z
remoteFeed (null)
numComments (null)
commentsUrl http://api.programmableweb.com/apis/the-globalproteomemachine/comments
Tags Database (tag1); science (tag2)
category Science
protocols REST
serviceEndpoint http://gpmdb.thegpm.org/
version (null)
wsdl (null)
data formats JSON
apigroups (null)
example (null)
clientInstall (null)
authentication (null)
ssl (null)
readonly (null)
VendorApiKits (null)
CommunityApiKits (null)
blog (null)
forum (null)
support (null)
accountReq No
commercial (null)
provider http://www.thegpm.org/
managedBy (null)
nonCommercial (null)
dataLicensing (null)
fees (null)
limits (null)
terms (null)
company (null)
updated 2012-12-17T09:51:40ZØ Mashup data: A mashup is a software application that uses at least one Web APIs. It
can also be a Web API itself. The information of such mashups is stored in mashup.txt.
Similar to API data, each line corresponds to a mashup record. The same delimiters are
used for separating the fields and values. $$$ is used to separate a component API and
its URL. The format of a mashup record is:
id$#$title$#$summary$#$rating$#$name$#$label$#$author$#$description$#$type$#$
downloads$#$useCount$#$sampleUrl$#$dateModified$#$numComments$#$
commentsUrl$#$tag1###tag2###tag3$#$api1$$$url1###api2$$$url2###api3$$$url3
…$#$updated
For example, the a mashup record in the file is:
http://www.programmableweb.com/mashup/compare-prices.info$#$ ComparePrices.info$#$This site is a demo to show the functionality of the Shopzilla.com API.
Supports the US and UK API versions. $#$4.2$#$ Compare-Prices.info$#$ ComparePrices.info$#$Unknown$#$This site is a demo to show the functionality of the
Shopzilla.com API. Supports the US and UK API versions.
$#$$#$0$#$2170$#$http://www.compare-prices.info/$#$2009-02-
10T00:35:01Z$#$2$#$http://api.programmableweb.com/mashups/compareprices.info/comments$#$affiliate###eBay###money###Program###shopping###Shopz
illa$#$Shopzilla$$$http://www.programmableweb.com/api/shopzilla$#$2009-02-
10T00:35:01Z
Therefore, we have the following information:
Ø F
or example The data R
id http://www.programmableweb.com/mashup/compareprices.info
title Compare-Prices.info
summary This site is a demo to show the functionality of the
Shopzilla.com API. Supports the US and UK API versions.
rating 4.2
name Compare-Prices.info
label Compare-Prices.info
author Unknown
description This site is a demo to show the functionality of the
Shopzilla.com API. Supports the US and UK API versions.
type (null)
downloads 0
useCount 2170
sampleUrl http://www.compare-prices.info/
dateModified 2009-02-10T00:35:01Z
numComments 2
commentsUrl http://api.programmableweb.com/mashups/compareprices.info/comments
tags Affiliate;eBay;money;Program;shopping;Shopzilla
APIs Shopzilla(http://www.programmableweb.com/api/shopzilla)
updated 2009-02-10T00:35:01Z2. Task:
Use a no-SQL database given the features of service data.
(For example: Download MongoDB from the website (https://www.mongodb.org/) and
install it. )
Design the data structure for the database, parse the text files and load the data to the
database. Using this database as the backend, develop a web-based query system that
allows the following query:
1. Return the names of APIs based on different criteria, including updated year,
protocols, category, rating (such as higher than, equal to, or lower than a given
rating), and tags.
2. Return the names of Mashups based on different criteria, including updated year,
used APIs, and tags.
3. Given a set of keywords, return the names of APIs if all the keywords can be found
in the title, summary, and the description of the APIs.
4. Given a set of keywords, return the names of Mashups if all the keywords can be
found in the title, summary, and the description of the Mashups.
Bonus point: You can get up to 5 points for the following task:
Discuss the comparison between your query platform and the query support by
programmableweb. Design, implement, and evaluate a strategy that can improve the accuracy of
keyword based service/mashup discovery.
Required deliverables:
1. A document that describes the design of the data structure and the query system. It
should contain the screenshots of the data in the database and testing scenarios of the
query system.
2. A readme file that describe how to install and test the query system.
3. The entire project packages for the system including the data storage and data query.
Include all the related files such as source code files, script files, data files,
configuration files, and etc. You can use any language you like. You can also develop
two projects: one for parsing and loading data to mongoDB, one for web-based query
system. Include all the related file for these two projects in this case.
Submission Information:
1. Generate one zip file and name the file as $lastname_$firstname_PA3.zip. No .rar
file will be accepted.
2. Submit the zip file to the corresponding dropbox folder by April 21.
Note:
1. You are allowed to discuss and learn in groups. However, you must design, develop, and
submit the entire assignment by yourself. If there are two submissions having the exact
same design, which is not very likely if working independently, the students will be asked
to revise the design and resubmit the work.
2. Plagiarism checking will be performed to all the submission in this course.