Assignment title: Information


Programming Assignment 3 Requirement: In this assignment, you are required to parse, store, manage, and analyze the service data retrieved from ProgrammableWeb. The service data and the required tasks are described as follows: 1. Service data. The service data was retrieved through ProgrammableWeb APIs (http://apiportal.anypoint.mulesoft.com/programmable-web/api/programmable-web-api) in summer 2014. The service data is stored in several .txt files: api.txt, mashup.txt, and members.txt: Ø API data: The information of Web APIs is stored in api.txt. In the file, each line corresponds an API; the fields are separated by delimiter $#$, the multiple values of a field are separated by ###. The format of each API record is: id$#$title$#$summary$#$rating$#$name$#$label$#$author$#$description$#$ty pe$#$downloads$#$useCount$#$sampleUrl$#$downloadUrl$#$dateModified$# $remoteFeed$#$numComments$#$commentsUrl$#$tag1###tag2###tag3$#$cate gory$#$protocols$#$serviceEndpoint$#$version$#$wsdl$#$dataFormats$#$apig roups$#$example$#$clientInstall$#$authentication$#$ssl$#$readonly$#$Vendor ApiKits$#$CommunityApiKits$#$blog$#$forum$#$support$#$accountReq$#$c ommercial$#$provider$#$managedBy$#$nonCommercial$#$dataLicensing$#$f ees$#$limits$#$terms$#$company$#$updated For example, the first API record in the file is: http://www.programmableweb.com/api/the-global-proteome-machine$#$ The Global Proteome Machine$#$Proteome data for biomedical research$#$4.4$#$ The Global Proteome Machine$#$ The Global Proteome Machine$#$$#$The Global Proteome Machine is an attempt to create knowledge from proteomics data and reuse it to solve biomedical research problems. The Global Proteome Machine Database was built to use GPM data to help validate peptide MS/MS spectra and protein coverage patterns. The Global Proteome Machine Database API provides RESTful access to commonly required information based on data from the GPM Database. Responses are JSON formatted. $#$1$#$$#$$#$http://wiki.thegpm.org/wiki/GPMDB_REST$#$$#$2012- 12-17T09:51:40Z$#$$#$$#$http://api.programmableweb.com/apis/the-globalproteomemachine/comments$#$database###science$#$Science$#$REST$#$http://gpm db.thegpm.org/$#$$#$$#$JSON$#$$#$$#$$#$$#$$#$$#$$#$$#$$#$$#$$#$No$#$$# $http://www.thegpm.org/$#$$#$$#$$#$$#$$#$$#$$#$2012-12-17T09:51:40Z Therefore, we have the following information: id http://www.programmableweb.com/api/the-global-proteomemachine title The Global Proteome Machine summary Proteome data for biomedical research rating 4.4 name The Global Proteome Machine label The Global Proteome Machine author (null)description The Global Proteome Machine is an attempt to create knowledge from proteomics data and reuse it to solve biomedical research problems. The Global Proteome Machine Database was built to use GPM data to help validate peptide MS/MS spectra and protein coverage patterns. The Global Proteome Machine Database API provides RESTful access to commonly required information based on data from the GPM Database. Responses are JSON formatted. type 1 downloads (null) useCount (null) sampleUrl http://wiki.thegpm.org/wiki/GPMDB_REST$#$$#$2012-12- 17T09:51:40Z downloadUrl (null) dateModified 2012-12-17T09:51:40Z remoteFeed (null) numComments (null) commentsUrl http://api.programmableweb.com/apis/the-globalproteomemachine/comments Tags Database (tag1); science (tag2) category Science protocols REST serviceEndpoint http://gpmdb.thegpm.org/ version (null) wsdl (null) data formats JSON apigroups (null) example (null) clientInstall (null) authentication (null) ssl (null) readonly (null) VendorApiKits (null) CommunityApiKits (null) blog (null) forum (null) support (null) accountReq No commercial (null) provider http://www.thegpm.org/ managedBy (null) nonCommercial (null) dataLicensing (null) fees (null) limits (null) terms (null) company (null) updated 2012-12-17T09:51:40ZØ Mashup data: A mashup is a software application that uses at least one Web APIs. It can also be a Web API itself. The information of such mashups is stored in mashup.txt. Similar to API data, each line corresponds to a mashup record. The same delimiters are used for separating the fields and values. $$$ is used to separate a component API and its URL. The format of a mashup record is: id$#$title$#$summary$#$rating$#$name$#$label$#$author$#$description$#$type$#$ downloads$#$useCount$#$sampleUrl$#$dateModified$#$numComments$#$ commentsUrl$#$tag1###tag2###tag3$#$api1$$$url1###api2$$$url2###api3$$$url3 …$#$updated For example, the a mashup record in the file is: http://www.programmableweb.com/mashup/compare-prices.info$#$ ComparePrices.info$#$This site is a demo to show the functionality of the Shopzilla.com API. Supports the US and UK API versions. $#$4.2$#$ Compare-Prices.info$#$ ComparePrices.info$#$Unknown$#$This site is a demo to show the functionality of the Shopzilla.com API. Supports the US and UK API versions. $#$$#$0$#$2170$#$http://www.compare-prices.info/$#$2009-02- 10T00:35:01Z$#$2$#$http://api.programmableweb.com/mashups/compareprices.info/comments$#$affiliate###eBay###money###Program###shopping###Shopz illa$#$Shopzilla$$$http://www.programmableweb.com/api/shopzilla$#$2009-02- 10T00:35:01Z Therefore, we have the following information: Ø F or example The data R id http://www.programmableweb.com/mashup/compareprices.info title Compare-Prices.info summary This site is a demo to show the functionality of the Shopzilla.com API. Supports the US and UK API versions. rating 4.2 name Compare-Prices.info label Compare-Prices.info author Unknown description This site is a demo to show the functionality of the Shopzilla.com API. Supports the US and UK API versions. type (null) downloads 0 useCount 2170 sampleUrl http://www.compare-prices.info/ dateModified 2009-02-10T00:35:01Z numComments 2 commentsUrl http://api.programmableweb.com/mashups/compareprices.info/comments tags Affiliate;eBay;money;Program;shopping;Shopzilla APIs Shopzilla(http://www.programmableweb.com/api/shopzilla) updated 2009-02-10T00:35:01Z2. Task: Use a no-SQL database given the features of service data. (For example: Download MongoDB from the website (https://www.mongodb.org/) and install it. ) Design the data structure for the database, parse the text files and load the data to the database. Using this database as the backend, develop a web-based query system that allows the following query: 1. Return the names of APIs based on different criteria, including updated year, protocols, category, rating (such as higher than, equal to, or lower than a given rating), and tags. 2. Return the names of Mashups based on different criteria, including updated year, used APIs, and tags. 3. Given a set of keywords, return the names of APIs if all the keywords can be found in the title, summary, and the description of the APIs. 4. Given a set of keywords, return the names of Mashups if all the keywords can be found in the title, summary, and the description of the Mashups. Bonus point: You can get up to 5 points for the following task: Discuss the comparison between your query platform and the query support by programmableweb. Design, implement, and evaluate a strategy that can improve the accuracy of keyword based service/mashup discovery. Required deliverables: 1. A document that describes the design of the data structure and the query system. It should contain the screenshots of the data in the database and testing scenarios of the query system. 2. A readme file that describe how to install and test the query system. 3. The entire project packages for the system including the data storage and data query. Include all the related files such as source code files, script files, data files, configuration files, and etc. You can use any language you like. You can also develop two projects: one for parsing and loading data to mongoDB, one for web-based query system. Include all the related file for these two projects in this case. Submission Information: 1. Generate one zip file and name the file as $lastname_$firstname_PA3.zip. No .rar file will be accepted. 2. Submit the zip file to the corresponding dropbox folder by April 21. Note: 1. You are allowed to discuss and learn in groups. However, you must design, develop, and submit the entire assignment by yourself. If there are two submissions having the exact same design, which is not very likely if working independently, the students will be asked to revise the design and resubmit the work. 2. Plagiarism checking will be performed to all the submission in this course.