MIDDLE EAST TECHNICAL UNIVERSITY ENGINEERING FACULTY DEPARTMENT OF COMPUTER ENGINEERING SOFTWARE REQUIREMENTS SPECIFICATIONS GROUP MALLORN Merve Bozo Yaşar Berk Arı Sertaç Kağan Aydın Mustafa Orkun Acar Team Leader: Itır Önal Supervisor : Asst.Prof.Dr. Pınar Karagöz SOFTWARE REQUIREMENTS SPECIFICATIONS TABLE OF CONTENTS TABLE OF FIGURES .................................................................................................................................................. 2 TABLE OF FIGURES .................................................................................................................................................. 3 TABLE OF SCREENSHOTS ......................................................................................................................................... 4 1. Introduction ........................................................................................................................................................ 5 1.1 Problem Definition ................................................................................................................................ 5 1.2 System Overview ................................................................................................................................... 5 1.3 Definitions, acronyms, and abbreviations ............................................................................................. 7 1.4 Assumptions and Dependencies ................................................................................................................... 7 2. Overall description .............................................................................................................................................. 8 2.1 Product functions ......................................................................................................................................... 8 2.1.1 Use-Case Model Survey ......................................................................................................................... 8 2.1.2 Actor survey......................................................................................................................................... 30 2.2 Interfaces .................................................................................................................................................... 30 2.2.1 User Interfaces .................................................................................................................................... 30 2.2.2 Hardware Interfaces ............................................................................................................................ 33 2.2.3 Software Interfaces ............................................................................................................................. 33 2.2.4 Communications Interfaces ................................................................................................................. 34 2.3 Constraints .................................................................................................................................................. 34 2.4 Assumptions and Dependencies ................................................................................................................. 34 3. Specific requirements ....................................................................................................................................... 35 3.1 Functional Requirements ............................................................................................................................ 35 3.1.1 Functional requirement 1 – Login ....................................................................................................... 35 3.1.2 Functional requirement 2 – Connect Database ................................................................................... 35 3.1.3 Functional requirement 3 - Detect anomalies ..................................................................................... 35 3.1.4 Functional requirement 4 - Correct anomalies ................................................................................... 35 3.1.5 Functional requirement 5 – Change settings ...................................................................................... 36 3.1.6 Functional requirement 6 – Complete Selected Column .................................................................... 36 3.1.7 Functional requirement 7 – Write results to DB ................................................................................. 36 3.1.8 Functional requirement 8 - Create Task .............................................................................................. 36 3.1.9 Functional requirement 9 – Optimize a feature set of selected column(s) ......................................... 36 3.1.10 Functional requirement 10 – EXPORT Model .................................................................................... 36 3.1.11 Functional requirement 11 – IMPORT Model ................................................................................... 36 3.1.12 Functional requirement 12 - Select column(s) for the table ............................................................. 37 3.1.13 Functional requirement 13 – Set Periodic Job .................................................................................. 37 SOFTWARE REQUIREMENTS SPECIFICATIONS 3.1.14 Functional requirement 14 – View Activities .................................................................................... 37 3.1.15 Functional requirement 15 - View Databases ................................................................................... 37 3.1.16 Functional requirement 16 – VIEW Models ...................................................................................... 37 3.1.17 Functional requirement 17 – VIEW Reports ...................................................................................... 37 3.1.18 Functional requirement 18 – Visualise Tlable ................................................................................... 38 3.1.19 Functional requirement 19 – Log out ................................................................................................ 38 3.2 Nonfunctional Requirements...................................................................................................................... 38 3.2.1 Usability ............................................................................................................................................... 38 3.2.2 Reliability ............................................................................................................................................. 38 3.2.3 Performance ........................................................................................................................................ 38 3.2.4 Supportability ...................................................................................................................................... 39 3.2.5 Security ................................................................................................................................................ 39 4. DATA Model and Description ............................................................................................................................ 40 4.1 Data Object ................................................................................................................................................. 40 4.1.2 Data dictionary ........................................................................................................................................ 42 4.2 Entity Relationship Model .......................................................................................................................... 44 4.2.1 Member Entity..................................................................................................................................... 45 4.2.2 Member Login Log Entity .................................................................................................................... 45 4.2.3 Databases Entity .................................................................................................................................. 45 4.2.4 Tables Entity ........................................................................................................................................ 45 4.2.5 Columns Entity .................................................................................................................................... 45 4.2.6 Statistics Entity .................................................................................................................................... 46 4.2.7 Notification Entity ............................................................................................................................... 46 4.2.8 User Notification Entity ....................................................................................................................... 46 5. References ......................................................................................................................................................... 47 TABLE OF FIGURES Figure 1: Component Diagram ................................................................................................................................ 6 Figure 2: General Use Case Diagram ....................................................................................................................... 9 Figure 3: Login Use Case ....................................................................................................................................... 11 Figure 4: Connect to Databases Use Case ............................................................................................................. 12 Figure 5: Detect Anomaly Use Case ...................................................................................................................... 13 Figure 6: Correct Anomaly Use Case ..................................................................................................................... 14 Figure 7: Change Settings Use Case ...................................................................................................................... 15 Figure 8: Complete Columns Use Case.................................................................................................................. 16 SOFTWARE REQUIREMENTS SPECIFICATIONS Figure 9: Write Results to DB Use Case ................................................................................................................. 17 Figure 10: Create Task Use Case ........................................................................................................................... 18 Figure 11: Optimize Feature Set Use Case ............................................................................................................ 19 Figure 12: Export Model Use Case ........................................................................................................................ 20 Figure 13: Import Model Use Case ........................................................................................................................ 21 Figure 14: Select Column Use Case ....................................................................................................................... 22 Figure 15: Set Periodic Job Use Case ..................................................................................................................... 23 Figure 16: View Activities Use Case ....................................................................................................................... 24 Figure 17: View Databases Use Case ..................................................................................................................... 25 Figure 18: View Models Use Case ......................................................................................................................... 26 Figure 19: View Reports Use Case ......................................................................................................................... 27 Figure 20: Visualize Table Use Case ...................................................................................................................... 28 Figure 21: Logout Use Case ................................................................................................................................... 29 Figure 22: Class Diagram of Vitriol ........................................................................................................................ 40 Figure 23: Entity Relationship Diagram ................................................................................................................. 44 TABLE OF FIGURES Table 1: Description of Definitions, Acronyms and Abbreviations ......................................................................... 7 Table 2: Overview of the Use-Case ....................................................................................................................... 10 Table 3: Description of Login Use Case ................................................................................................................. 11 Table 4: Description of Connect to Databases ...................................................................................................... 12 Table 5: Description of Detect Anomaly Use Case ................................................................................................ 13 Table 6 Description of Correct Anomaly Use Case ................................................................................................ 14 Table 7: Description of Change Settings Use Case ................................................................................................ 15 Table 8: Description of Complete Columns Use Case ........................................................................................... 16 Table 9: Description of Create Task Use Case ....................................................................................................... 18 Table 10: Description of Optimize Feature Set Use Case ...................................................................................... 19 Table 11: Description of Export Model ................................................................................................................. 20 Table 12: Description of Import Model Use Case ................................................................................................. 21 Table 13: Description of Select Columns Use Case ............................................................................................... 22 Table 14: Description of Set Periodic Job Use Case .............................................................................................. 23 Table 15: Description of View Activities Use Case ................................................................................................ 24 Table 16: Description of View Databases Use Case .............................................................................................. 25 Table 17: Description of View Models Use Case ................................................................................................... 26 Table 18: Description of View Reports Use Case .................................................................................................. 27 SOFTWARE REQUIREMENTS SPECIFICATIONS Table 19: Description of Visualize Table Use Case ................................................................................................ 28 Table 20: Description of Logout Use Case ............................................................................................................. 29 Table 21: Description of Actor Survey ................................................................................................................... 30 Table 22: Description of the Data Model .............................................................................................................. 43 Table 23: Description of the methods ................................................................................................................... 43 TABLE OF SCREENSHOTS Screenshot 1: Home Page of Vitriol ...................................................................................................................... 31 Screenshot 2: Login Page of Vitriol ....................................................................................................................... 31 Screenshot 3: Choose Column Page of Vitriol ....................................................................................................... 32 Screenshot 4: Pie Chart of Results ........................................................................................................................ 32 Screenshot 5: Statistical Analyze of Results .......................................................................................................... 33 SOFTWARE REQUIREMENTS SPECIFICATIONS 1. INTRODUCTION This Software Requirements Specification provides a complete description of all the functions and specifications of the Vitriol project, which is a generic machine learning tool targeting any domain and any user. This document includes purpose, features and the interfaces of the application. The intended audience of this document includes the prospective developers of the system. The software system to be produced is a generic, automated machine learning tool. The system mainly targeting software companies who lacks know how for big data. The system may be used by any company from any domain with no expertise. Any user from any company that knows the basics of computer usage will be able to use the system without any training. During the preparation of the document, ISO-IEC-IEEE 29148-2011 is followed. External interfaces, functional requirements, use case diagrams, sequential diagrams, ER diagram and class diagrams are organized in the following sections. 1.1 PROBLEM DEFINITION Big data is the one of the most trending topic in computer science nowadays, Moreover in modern world, information is the most valuable currency. The job of a data scientist is mainly to retrieve information from existing data. However it is a new topic and the sector lacks qualified scientist. Vitriol is aiming to solve that problem by eliminating the human factor from the data science in long term. Vitriol is the first and only generic, automated machine learning tool that enables its users to run certain machine learning algorithms without any machine learning information. The system is developed in such a manner that, any person with basic computer skills can use the system without any extra information or training. 1.2 SYSTEM OVERVIEW The system consist of 2 servers, 1 web application, 2 database, 1 memcached server and 1 machine learning machine. One of the servers is used for main application of the web service, the other one on the other hand, is used for notification mechanism, which is implemented via web socket technology. Web application is implemented mainly by html, JavaScript and css. The two databases are used for information storage. The first one is for the web application and the second one is for the data of the users. Memcached server is used for session storage. It is designed as a completely separate machine in order to commonize session objects between servers in case of the probability of using more than one server to balance the load of the server. Machine learning machine is the core of the system where all the algorithms and functionalities are applied. SOFTWARE REQUIREMENTS SPECIFICATIONS Figure 1: Component Diagram The figure above is the component diagram for Vitriol. It shows the interactions between the system user, Web Interface, Main System Server, Database Server and the Computation Server. The main actions of the user are creating a new task on the system and viewing the reports for the previously started tasks. The user interacts with the system using the provided Web Interface. Web Interface is responsible for transmitting user requests to the server and also displaying the results returned to the user. All the requests arriving to the server side pass through the Load Balancer to minimize the response time and avoiding overloading a single system server. The main server handles the requests sent by the user by starting the desired operation and sending the results and obtained model to the user after it is completed. Main server basically communicates (sends the information about the operation) with the computation server to execute the operation. It also can send requests to the Database Server for some tasks such as authentication etc. Computation server is the fundamental hardware in which the algorithms are operated. It handles the requests by the main system server and generates models or performs other types of computations. To achieve this, it requests the required data from the database server. Upon completion, the obtained models and results are returned to the main server. Lastly, the database server is responsible for handling data requests by providing the required set of data to the Main Server or Computation Server. SOFTWARE REQUIREMENTS SPECIFICATIONS 1.3 DEFINITIONS, ACRONYMS, AND ABBREVIATIONS SRS Software requirements Specification ML Machine Learning Npm Node package manager IEEE Institute of Electrical and Electronics Engineers DBMS Database management system ER Diagram Entity Relationship Diagram Member User A user that registered to the system MySQL An open source database management system TCP/IP A communication protocol for the internet and similar networks Notification Instantaneous information that is shown to the users Pm2 Process management system for nodejs DB Database Table 1: Description of Definitions, Acronyms and Abbreviations 1.4 ASSUMPTIONS AND DEPENDENCIES Both web servers are implemented assuming a Linux based operating system is running on the server machines (Centos [6] or Linux is preferable.) And also designs are made in such a manner that, server machines have java, scala, nodejs and spark installed in it already. If any of this assumptions will be changed, the design of the project and content of this document shall be changed accordingly. The development team may assume that the users of the system has sql databases(Postresql, MySQL, mssql, oracle etc.) and have their data in a single table. In case of need of a join or limit operation, the user of the system is assumed to provide the table in already joined or limited state. SOFTWARE REQUIREMENTS SPECIFICATIONS 2. OVERALL DESCRIPTION 2.1 PRODUCT FUNCTIONS The project Vitriol is a generic, automated machine learning tool that serve to any user with the data from any domain. It is designed as an on cloud service, hence the major parts for the project are web server and machine learning engine. The user that want to perform certain data operations on his/her data(such as anomaly detection or feature derivation) connects to the system via web interface, introduce his/her database by the help of the Ip (internet protocol) then choose what to do with the data(i.e. complete). That’s all. After that web service creates necessary records in the database and send signal to the ml engine to work on the data. As soon as the ml engine finishes the job, it inserts the necessary information to the notification table in the database. When the data is inserted a trigger function works on the database which invokes a web socket server, and web socket server pushes the notification in to the client side of the web system. 2.1.1 USE-CASE MODEL SURVEY The system works for only one user (namely user) which is able to work every aspect of the project. In future development it is being considered to have different type of users for different type of action, however it is out of scope of this document. The use cases, and detailed explanations of the use cases are demonstrated below. SOFTWARE REQUIREMENTS SPECIFICATIONS Figure 2: General Use Case Diagram SOFTWARE REQUIREMENTS SPECIFICATIONS No Functionality Short Description 1 Login Logging into system with membership and password 2 Connect to Database Connect a remote database 3 View Databases View all databases 4 Create Task Creation of a task for processing the data 5 Select Column(s) from table Select specific columns for starting process 6 Anomaly Detection Show irrelevant parts of the data 7 Anomaly Correction Correct irrelevant parts of table 8 Create Model for selected column Create a model on selected columns for future use 9 Complete Selected Column Complete empty parts of selected column 10 Optimize a feature set of selected column(s) Create new features using selected column(s) 11 Import Model Import an external model 12 Export Model Export a model for future use 13 View Models View previously created models 14 View Reports View the detailed reports about process 15 Visualize Table Visualize the distribution of data points in a table 16 Set Periodic Job Set a specific job to work periodically 17 View Activities Logs of previous activities 18 Change Settings Change user settings (password, email, username etc.) 19 Logout Logout from system Table 2: Overview of the Use-Case Vitriol system provides many functionalities as seen above. All of them requires user to be logged in, which is also a use case for the user. Some of the cases include other cases also. For sake of simplicity each use case is explained separately below. SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.1 LOGIN USE CASE FOR USER Figure 3: Login Use Case Use Case ID UC1 Use Case Name Login Description This use case describe event in which a user login to system with his/her username and password. Actors User Precondition - Trigger The user tries to login to the system using the web service providing login credentials. Basic Flow 1- The user enters his/her username 2- The user enters his/her 8 digit password 3- Web service checks the database for password confirmation via server 4- Web service displays a correct password message Exception Flow If the entered password does not match with the password in the database an error message is displayed by web service. Post Conditions - Table 3: Description of Login Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.2 CONNECT TO DATABASE USE CASE FOR USER Figure 4: Connect to Databases Use Case Use Case ID UC2 Use Case Name Connect to Database Description This use case describe event in which a user connect to database with its credentials Actors User Precondition - Trigger The user tries to connect a DB to his Vitriol account by providing the DB credentials such as DB address and password. Basic Flow 1- The user enters the remote database Ip address 2- The user enters his/her database password 3- If the password is accepted by remote database, database table names are shown Exception Flow If the password is refused by remote database, an error message is displayed by web service Post Conditions - Table 4: Description of Connect to Databases SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.3 DETECT ANOMALY USE CASE FOR USER Figure 5: Detect Anomaly Use Case User Case ID UC3 User Case Name Detect Anomaly Description This use case describes the operation that user can start a task to detect different kinds of anomalies represented in the dataset. Actors Member User Precondition User has to login the system, connect to a database and creates the task after selecting some columns. Trigger User creates anomaly detection task using the interface. Basic Flow 1 - Anomaly detection module operates on the selected columns. 2 - Results are shown on the reports page. Exception Flow - Post Conditions Data sets are analyzed and anomalies detected to be shown. Table 5: Description of Detect Anomaly Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.4 CORRECT ANOMALY USE CASE FOR USER Figure 6: Correct Anomaly Use Case User Case ID UC4 User Case Name Correct Anomaly Description This use case describes the operation in which inconsistent data points are cleared or updated. Actors Member User Precondition User has to login to the system, connect to a database and creates the task after selecting the columns. Trigger User creates anomaly correction task. Basic Flow 1 - Anatoly correction module operates on the selected columns. 2 - Datasets are updated in databases. 3 - Results are shown on the reports page. Exception Flow - Post Conditions Datasets are analyzed and anomalies detected to be shown and datasets are updated according to correction or deletion decisions. Table 6 Description of Correct Anomaly Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.5 CHANGE SETTINGS USE CASE FOR USER Figure 7: Change Settings Use Case User Case ID UC5 User Case Name Change Settings Description This use case describes that user can change personal settings such as passwords, avatars, security questions etc. Actors Member User Precondition User must be logged in to the system. Trigger User clicks on the 'change settings' button. Basic Flow 1-User selects the data that he/she wants to update. 2-User enters the new value. 3 - New values are checked according to some expected range. If it is valid, database entry is updated accordingly. Exception Flow 1 - If the new entered value is not in the expected range, an exception occurs and an error message are shown to the user. Post Conditions Updated user information in the database. Table 7: Description of Change Settings Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.6 COMPLETE COLUMNS USE CASE FOR USER Figure 8: Complete Columns Use Case User Case ID UC6 User Case Name Complete Columns Description This use case describes the operation of the completion of missing and invalid entries in columns of the databases. Actors Member User Precondition User must be logged in to the system and have at least one connected database. Trigger User starts the data completion task from the interface. Basic Flow 1 - User select the column that operation will be applied on. 2 - Complete columns module works on the selected column and completes the missing values. 3 - If the multiple columns are selected by the user, the operation is performed on the selected columns consecutively. Exception Flow - Post Conditions Updated datasets are shown to the user. A report is generated about the task. Table 8: Description of Complete Columns Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.7 WRITE RESULTS TO DB USE CASE FOR USER Figure 9: Write Results to DB Use Case User Case ID UC7 User Case Name Write Results to DB Description This use case describes the operation of write the obtained results after operations to the database of the user Actors Member User Precondition User must be logged in to the system and have at least one connected database. User must create task and the execution of the task should be completed. Trigger User clicks on the ‘write to database’ button on the web interface. Basic Flow 1 – User clicks on the ‘write to database’ column on the web interface. 2 – A connection between the web server and the database of the user is established. 3 – Data is copied to the database of the user. Exception Flow - Post Conditions - SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.8 CREATE TASK USE CASE FOR USER Figure 10: Create Task Use Case User Case ID UC8 User Case Name Create Task Description This use case describes the operation that allows the user to specify the details of the desired task. Actors Member User Precondition User must be logged in to the system Trigger User clicks on the create model button. Basic Flow 1-The new window is shown to the user asking for the details about the operation that will be created. Exception Flow 1 - If the user didn’t provide columns of a table, exception occurs and an error massage are shown to the user. Post Conditions New task added to the system job queue having user specified type. Table 9: Description of Create Task Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.9 OPTIMIZE FEATURE SET USE CASE FOR USER Figure 11: Optimize Feature Set Use Case User Case ID UC9 User Case Name Optimize Feature Set Description This use case describes the operation that optimizes the feature set. Actors Member User Precondition User must be logged in to the system and created the related task. Trigger User selects the 'optimize the feature set' operation in task creation phase. Basic Flow 1 - Related task is performed on the dataset provided by the user. 2 - The result are shown on the report page. Exception Flow - Post Conditions Feature sets are optimized. Table 10: Description of Optimize Feature Set Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.10 EXPORT MODEL USE CASE FOR USER Figure 12: Export Model Use Case User Case ID UC10 User Case Name Export Model Description This use case describes the operation of creating xml encoding of an existing model created by the system. Actors Member User Precondition User must be logged in the system. Trigger User selects a model and clicks the 'export model' button. Basic Flow 1-Existing model encoded as xml file. 2-Encoded file is downloaded by the user. Exception Flow - Post Conditions Xml file created. Table 11: Description of Export Model SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.11 IMPORT MODEL USE CASE FOR USER Figure 13: Import Model Use Case User Case ID UC11 User Case Name Import Model Description This use case describes the operation of importing XML encoded model and addition to the user's models. Actors Member User Precondition User must be logged in to the system. Trigger User selects a model and clicks the 'import model' button. Basic Flow 1 – User selects a model file to import. 2 – Selected file is checked for validity and if successful, inserted into the user's models. Exception Flow 1 – If the selected file for import is invalid, an error message is shown to the user. Post Conditions An external method is imported and added to the user's models. Table 12: Description of Import Model Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.12 SELECT COLUMN USE CASE FOR USER Figure 14: Select Column Use Case User Case ID UC12 User Case Name Select Columns Description This use case describes an event in which a user selects the columns that will be processed. Actors User Precondition The user has to login the system and connect to a database. Trigger The user clicks the “Select Columns” button. Basic Flow 1 – The user selects the desired columns by clicking on them on a table preview screen. 2 – Selection process ends when the user clicks “OK” button. Exception Flow - Post Conditions Some columns are selected by the user for the process. Table 13: Description of Select Columns Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.13 SET PERIODIC JOB USE CASE FOR USER Figure 15: Set Periodic Job Use Case User Case ID UC13 User Case Name Set Periodic Job Description This use case describes the operation of setting a job to be run in the future, possibly periodically. Actors Member User Precondition The user must be logged in to the system. Trigger User chooses the operation using the interface. Basic Flow 1 – User chooses the operation by clicking the button in 'Jobs' tab. 2 – User creates a periodic job by providing the job (operations) and timing information and confirms by clicking 'OK'. Exception Flow 1 – If the user clicks 'OK' button without providing an operation as s/he creates the periodic job, an error message is displayed. Post Conditions A periodic job to be run in a future time is added to the job queue. Table 14: Description of Set Periodic Job Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.14 VIEW ACTIVITIES USE CASE FOR USER Figure 16: View Activities Use Case User Case ID UC14 User Case Name View Activities Description This use case describes the operation of viewing past activities by that account. Actors Member User Precondition The must be logged in to the system. Trigger User clicks on the 'View Activities ‘tab. Basic Flow 1 - User chooses the action. 2 – Past activities are displayed. If there is no activity, a message is displayed instead. Exception Flow - Post Conditions The past activities of the user is displayed. Table 15: Description of View Activities Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.15 VIEW DATABASES USE CASE FOR USER Figure 17: View Databases Use Case User Case ID UC15 User Case Name View Databases Description This use case describes the operation of viewing databases connected to the system. Actors Member User Precondition The user must be logged in to the system. Trigger User clicks on the 'View Databases’ tab. Basic Flow 1 - User chooses the action. 2 – Connected databases are displayed. If there is no databases, a message is displayed instead. Exception Flow - Post Conditions The connected databases are displayed. Table 16: Description of View Databases Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.16 VIEW MODELS USE CASE FOR USER Figure 18: View Models Use Case User Case ID UC16 User Case Name View Models Description This use case describes the operation of viewing previously created models. Actors Member User Precondition The user must be logged in to the system. Trigger User clicks on the 'View Models' tab. Basic Flow 1 - User chooses the action. 2 – Models are displayed. If there is no models, a message is displayed instead. Exception Flow - Post Conditions The models are shown. Table 17: Description of View Models Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.17 VIEW REPORTS USE CASE FOR USER Figure 19: View Reports Use Case User Case ID UC17 User Case Name View Reports Description This use case describes the operation of viewing created reports about previous processing of the user. Actors Member User Precondition The user must be logged in to the system. Trigger User clicks on the 'View Reports' tab. Basic Flow 1 - User chooses the action. 2 – Reports about the past actions are displayed. If there is no reports, a message is displayed instead. Exception Flow - Post Conditions The reports are shown. Table 18: Description of View Reports Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.18 VISUALIZE USE CASE FOR USER Figure 20: Visualize Table Use Case User Case ID UC18 User Case Name Visualize Table Description This use case describes the operation in which a database table is visualized for better understanding. Actors Member User Precondition User must be logged in to the system. At least one table must be connected to user's account. Trigger User clicks on the 'Visualize table' button after selecting a table. Basic Flow 1 – User selects a table and chooses visualizing operation with options of 2D and 3D visualization. 2 – The related algorithm produces the resulting graph and it is shown to the user. Exception Flow - Post Conditions The desired visualization is obtained. Table 19: Description of Visualize Table Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.1.19 LOGOUT USE CASE FOR USER Figure 21: Logout Use Case User Case ID UC19 User Case Name Logout Description This use case describes an event in which a user logs out from the system. Actors User Precondition User has to login the system Trigger The user tries to log out from the system using log out button. Basic Flow 1 - User clicks the log out button 2 - Session is finished by the web service Exception Flow - Post Conditions The user is redirected to the login screen Table 20: Description of Logout Use Case SOFTWARE REQUIREMENTS SPECIFICATIONS 2.1.2 ACTOR SURVEY The vitriol system consists of only one actor type namely user or member user. The details of the actor can be seen in the table below. Actor Name Member User Description Member user or shortly the user is anyone that registers and get a password from Vitriol system. All the functionalities of Vitriol can be used by member user. Table 21: Description of Actor Survey 2.2 INTERFACES 2.2.1 USER INTERFACES The user interface of the web application is comprehensive and easy to use. At the header part user is able to see his/her notifications and manage the setting of his/her account. In the left-most part of the page there is a list of operation that user can do. This operations are mainly the use cases that described in section 2.1.1. This two sections are fixed in every page. At the middle, beside from these sections, there is a white board which shows a little variations regarding of the operation that user want to do. For example if the user wants to connect database, the left-most and the top section is the same but in the white board he/she sees a form to connect database. If he/she wants to see the reports some charts is demonstrated instead. The details of the user interface can be observed by looking at the figures below. SOFTWARE REQUIREMENTS SPECIFICATIONS Screenshot 1: Home Page of Vitriol Screenshot 2: Login Page of Vitriol SOFTWARE REQUIREMENTS SPECIFICATIONS Screenshot 3: Choose Column Page of Vitriol Screenshot 4: Pie Chart of Results SOFTWARE REQUIREMENTS SPECIFICATIONS Screenshot 5: Statistical Analyze of Results 2.2.2 HARDWARE INTERFACES Since it is an on cloud web service, the project requires a server machine (physical or virtual), which shall be reliable. SATA and SAS disk bus standards will be adopted along with RAID technology so that, better maintenance and reliability of the persistence storage can be achieved. And also CPU and Memory of the server shall be as high as possible (8 GB RAM, Intel Xeon e5-2699 v3 @ 2.30 GHz at least) to achieve %99.99 availability constraint. If more than one server is going to be used behind a load balancer, than another machine for in-memory caching memory would also be beneficial in order to commonize session objects. 2.2.3 SOFTWARE INTERFACES  The server machine shall have a Linux environment namely Ubuntu or CentOs version at least 14.04 and 6.7 accordingly  Java Runtime Environment (Java RE), version 1.6 or later, is required for ml engine, which is used on the server side of the product.  Apache Spark version 1.6 is required by the ml engine  Nodejs environment and it package manager npm shall be installed on the server. SOFTWARE REQUIREMENTS SPECIFICATIONS  Memcached is used for session storage. Hence it must either be used as a distinct machine or the server that uses memcached shall have memcached installed on it.  Postgresql version 9 or higher is used for the database server.  Apache tomcat will be used for a java bridge server for the communication of web server and ml engine. 2.2.4 COMMUNICATIONS INTERFACES The server and Database Management System communicate using the TCP/IP protocol. And also the notification mechanism is designed by using web socket protocol. 2.3 CONSTRAINTS The one and only constraint for the design is to not to work on data on the database of the user. Since every query that is being worked on the database results in a cost, the project have to replicate the data to its own database to work on it. 2.4 ASSUMPTIONS AND DEPENDENCIES  The development team may assume that, the data that is being the object of the project will be provided as a single sql table. If any join or limitation will be performed on the data, that is assumed to be done by the end user.  Another assumption is that the database of the user is open to remote access.  The last assumption can be made by the developers is that the use will connect a sql database, which fulfills the basic requirements of sql tables. SOFTWARE REQUIREMENTS SPECIFICATIONS 3. SPECIFIC REQUIREMENTS In this section, all of the software requirements specified in details so that designers can design the system and testers can test to the system to satisfy these requirements. 3.1 FUNCTIONAL REQUIREMENTS 3.1.1 FUNCTIONAL REQUIREMENT 1 – LOGIN The user shall be log into system by a username and password. Password should be kept in the database in an encrypted version for security concerns. The login mechanism shall work in cooperate with session mechanism. Whenever a user logs in a session shall be created and shall be kept in in-cache memory and browser cookie. If another login occurs for the same user name the previous session shall be dropped. And also if the user that logs into system shows no action for 15 minutes the session again shall be dropped. The login information shall be logged into the database for the sake of user reports. 3.1.2 FUNCTIONAL REQUIREMENT 2 – CONNECT DATABASE The user shall be able to connect his/her database just by providing the credentials such as Ip address, port number username and password of the database. No other information shall be asked to the user. 3.1.3 FUNCTIONAL REQUIREMENT 3 - DETECT ANOMALIES The user shall be able to detect anomalies of the data from the database that he/she connected. This is the process of viewing irrelevant parts of the data. 3.1.4 FUNCTIONAL REQUIREMENT 4 - CORRECT ANOMALIES The user shall be able to correct the anomalies that he/she detects. This process includes the functional requirement 3. SOFTWARE REQUIREMENTS SPECIFICATIONS 3.1.5 FUNCTIONAL REQUIREMENT 5 – CHANGE SETTINGS The user shall be able to change his/her setting such as password or username. 3.1.6 FUNCTIONAL REQUIREMENT 6 – COMPLETE SELECTED COLUMN The user shall be able to complete the data in the selected column if it has missing parts. This completion shall be done by at least %60 accuracy. This process includes functional requirement 12. 3.1.7 FUNCTIONAL REQUIREMENT 7 – WRITE RESULTS TO DB The user shall be able to write results of his/her tasks to database. 3.1.8 FUNCTIONAL REQUIREMENT 8 - CREATE TASK The user shall be able to create task in order to process the data. The created task shall be remembered by the system. 3.1.9 FUNCTIONAL REQUIREMENT 9 – OPTIMIZE A FEATURE SET OF SELECTED COLUMN(S) The user shall be able to optimize a feature set for the selected column or columns. 3.1.10 FUNCTIONAL REQUIREMENT 10 – EXPORT MODEL The user shall be able to export the ml model that is generated for his/her data in order to use it in the system in the future. 3.1.11 FUNCTIONAL REQUIREMENT 11 – IMPORT MODEL The user shall be able to import ml model that is exported before or created by himself/herself. SOFTWARE REQUIREMENTS SPECIFICATIONS 3.1.12 FUNCTIONAL REQUIREMENT 12 - SELECT COLUMN(S) FOR THE TABLE The user shall be able to select column(s) from the table that is inside the database that he/she connected. The columns shall be listed to the user for him/her to choose. 3.1.13 FUNCTIONAL REQUIREMENT 13 – SET PERIODIC JOB The user shall be able to set periodic jobs for any task that he/she wants. The periodicity of the task may be weekly, two times a month, monthly and never. 3.1.14 FUNCTIONAL REQUIREMENT 14 – VIEW ACTIVITIES The user shall be able to see previous activities taken by himself/herself in order to keep track on the process of his/her data. 3.1.15 FUNCTIONAL REQUIREMENT 15 - VIEW DATABASES The user shall be able to see his/her databases that are connected before. The databases shall be listed for the user to choose. 3.1.16 FUNCTIONAL REQUIREMENT 16 – VIEW MODELS The user shall be able to see the ml models that are created before for his/her data. In other words system shall remember the models that is created before. 3.1.17 FUNCTIONAL REQUIREMENT 17 – VIEW REPORTS The user shall be able to view reports about his/her process. SOFTWARE REQUIREMENTS SPECIFICATIONS 3.1.18 FUNCTIONAL REQUIREMENT 18 – VISUALISE TLABLE The user shall be able to see the distribution of the data points in his/her table. This visualization shall be done by using dot plots. 3.1.19 FUNCTIONAL REQUIREMENT 19 – LOG OUT The user shall be able to log out and kill his/her session whenever he/she wants. 3.2 NONFUNCTIONAL REQUIREMENTS 3.2.1 USABILITY Vitriol is a handy tool that requires no extra training. It is targeting mainly software companies but this is not a requirement. Any user that has only basic computer knowledge will be able to use the system easily. 3.2.2 RELIABILITY If any component of the system does not response to user, the system shall display informative message about the error. Since the ml algorithms takes huge amount of time the front end side shall not be locked during these processes. There shall be a backup system in order not to lose information of users of the system. The system should be available %90 of time during a year. Moreover there should be a beta environment for future development in order not to cause an unforeseen error. The system shall be implemented such a way that if there will be transportation in the system (such as change the database server or add a new server to the system), the maintainability shall not be broken. 3.2.3 PERFORMANCE The average response time of the system shall be 200 milliseconds or less. Moreover it shall never be more than 2 second even in extreme cases. It shall be able to work under the load of 1024 concurrent users, and transaction per second shall not be less than 20. All the sql queries shall be written regarding this constraint. SOFTWARE REQUIREMENTS SPECIFICATIONS If the system degraded it shall still run the web service. It shall lock the ml engine for a short period of time and show the users that want to run certain tasks, an informative message. All other parts of the system such as demonstrating reports or changing settings shall be available for all users. The resource utilization shall be monitored by system admins frequently. An infrastructure monitoring tool shall be used for that purpose. Nagios [5] is recommended but not mandatory. The monitoring tool shall send e mail to the admins of the system when the storage or memory is full, critical (more than %75 load), OK state(less than %75 load). It shall also send e-mail on CPU is in critical state. This monitoring tool shall be installed on main web server, notification server, and databases servers. The in-cache memory system does not need to be monitored since it has its own utilization management. 3.2.4 SUPPORTABILITY The system shall be designed in such a manner that supportability is considered. To run the backend side a process management tool shall be used. Since the server side coded in nodejs pm2 (Advanced, production process manager for Node.js) is highly recommended. Every component of the system shall keep meaningful of that show the state of the system. Log file shall include date and time and system log files and error log files shall be separated if possible for the sake of simplicity in maintainability. 3.2.5 SECURITY The system shall be designed concerning security issues. The password of the user shall not be seen on the user screen. (It should be demonstrated as dots instead). The password of the users shall be kept encrypted. SOFTWARE REQUIREMENTS SPECIFICATIONS 4. DATA MODEL AND DESCRIPTION 4.1 DATA OBJECT Since the website is designed in asynchronous pattern and does not fit in basic object oriented design concepts, any class diagram or data dictionary will not be provided in this document. For the ML engine related diagrams and explanations are provided in the coming sections. Figure 22: Class Diagram of Vitriol SOFTWARE REQUIREMENTS SPECIFICATIONS 4.1.1.1 USER This class represents the member user entity of the system. It has the following attributes: username, password, database, report, model, job, activity. Username and password attributes are primitive data types whereas the remaining attributes are instances of some other classes. It stores the information about the user and his data on the system. This class has related methods to update its fields. 4.1.1.2 DATABASE This class has the abilities to perform database operations such as connecting, retrieving data etc. It is capable of executing SQL queries on the connected database. It has the attributes of ip, port, dbname, password and username to connect to a database. 4.1.1.3 OPERATION This class consists of Vitriol’s core machine learning and data mining functionalities such as optimizing a feature set (a database table), detection & correction of anomalies, completing tables, visualizing databases for better understanding of the dataset. These methods are utilized for generating models and eventually user defined jobs to perform the desired task. 4.1.1.4 MODEL This class represents a main construction of Vitriol, namely models. A model consists of multiple sequential operations defining a machine learning model design. This class has the attribute of an operation list, which is operations. 4.1.1.5 JOB This class represents user defined jobs. Every task which the user wants to accomplish are stored and processed as jobs in the system. A user task is translated into a model and after that, a job is defined holding that model and also other information about the task including userID and database as a job. This class has the following attributes: model, time, database, tableName, userID and jobID. jobID is used to identify a unique job. SOFTWARE REQUIREMENTS SPECIFICATIONS 4.1.1.6 REPORT Report class represents the after operation reports and their generation. The class has the following attributes: userID, data, reportPath. A call to the generateReport(Job job) generates a report in an address in the memory and stores in the object’s related field. 4.1.1.7 ACTIVITY This class represents logs of users’ past activities. It has the attributes of userID, time and activitydetail to store a user’s a specific activity on the system for future checks. 4.1.2 DATA DICTIONARY Attribute Definition username(User) It is a string defining the unique username for user to log into the system. password(User) It is a string defining the user password for user to log into the system database It is the database of the user. For detailed information please refer to section 4.1.1.2 model It is the model object of the user. For detailed information please refer to 4.1.1.4 job It is the job object of the user. For detailed information please refer to 4.1.1.5 activity It is the activity object of the user. For detailed information please refer to 4.1.1.7 userID It is a string defining the unique id of the user time It is a Datetime object defining the time of the job tableName It is a string defining the name of the table that the job will be executed. ip It is a string defining the ip address of the database port It is a string defining the port number of the database dbname It is a string that defines the name of the database password(Database) It is the password credential of the database username(Database) It is the username credential for the database operations It is a list consisting of operations that the modal includes. For detailed information please refer to 4.1.1.3 userID(Activity) It is a string defining the user id that owns the activity object. SOFTWARE REQUIREMENTS SPECIFICATIONS time It is a Datetime object defining the execution time of activity. activityDetail It is a string defining the details about the activity. userID(Reports) It is a string defining the user id that owns the report object date(Report) It is a Datetime object defining the creation date of report. reportpath It is a string defining the path of the report in storage Table 22: Description of the Data Model Method Description changeUsername It is method that changes the username . changePassword It is method that changes the user password. addDatabase It is a method that add a new database to user’s object showReport It is a method that shows the reports to the user. addModel It is a method that adds the new model to the models of new object. removeModel It is a method that removes the model showActivity It is method that shows the activities of the user’s object. createJob It is method that creates the new job. executeQuery It is method that executes to given query given database createModel It is method that creates a new model in job object generateReport It is method that generates the new report to the user createActivity It is method that creates the new activities. optimizeFeatureSet It is method that optimizes the feature set detectAnomaly It is method that detects the anomaly in data correctAnomaly It is method that correct anomaly in data visualizeTable2D It is method that demonstrates the table in 2D dotplot visualizeTable3D It is method that demonstrates the table in 3D dotplot Table 23: Description of the methods SOFTWARE REQUIREMENTS SPECIFICATIONS 4.2 ENTITY RELATIONSHIP MODEL Figure 23: Entity Relationship Diagram SOFTWARE REQUIREMENTS SPECIFICATIONS For all entity object except notification primary keys are chosen as 32 bit unique uids because of security issues. Notification table will have very few records, so primary key field for notification is just chosen as integer. The database of the system is designed in such a manner that no information is kept more than once. The tables are connected to each other with foreign keys to keep track of information without replicating the data. 4.2.1 MEMBER ENTITY Member entity defines the user of the system. It has field’s id, name, surname, and password and user name. Password of the users are kept under md5 encryption for the sake of security. User name for each user is unique. 4.2.2 MEMBER LOGIN LOG ENTITY Member login log entity is the record of the users in terms of sessions. It keeps the information that when a user logs into system and how much does it lasts. User_fk defines the member id, sid defines the specific id for the related session, startdate is the exact time by the session starts. Last act date is the time that the user takes action in the system, Ip address is the Ip of the user and terminate is the information about whether the session is terminated by the server or not(i.e. a multiple login with the same username) 4.2.3 DATABASES ENTITY Databases entity is simply the records of the databases that the users connected to the vitriol. Ip, port, database_name, user_name and password are nothing but the credentials of the database of the user. member_fk defines the member id, that the database belongs to and is_active shows whether the database is active or not. 4.2.4 TABLES ENTITY Tables just demonstrate the tables in each database from the databases entity. Table name field is simply defines the name of the table and database_fk demonstrates which database does the table belong to. 4.2.5 COLUMNS ENTITY Columns demonstrate the columns of each tables. The data_type field is for the type of the column such as integer varchar. Column name is the name of the column and tables_fk defines which table does the column belongs to. SOFTWARE REQUIREMENTS SPECIFICATIONS 4.2.6 STATISTICS ENTITY Statistics entity stores the information about the statistics related to the columns. Mean value is the field that the mean value for the column kept. Median is for median and standard_dev is for standard deviation. Is_sparse, is_dense, is_continious is_discrete fields are for the sparse-dense and discrete-continuous information for the column. Also number of the data in the column and number of outliers are kept. Finally columns_fk is for the column id that the statistics belongs to. 4.2.7 NOTIFICATION ENTITY Notification is for the categorize the notification of the system. At first 4 types of notifications are planned. "These are the process that was started in 'date' is finished", "You are not logged into the system since 'date'", "A new feature is added to the system, would you like to try" and "The system will be unavailable between 'date-date'". 4.2.8 USER NOTIFICATION ENTITY User notification is for the information that is pushed to the client as notification. It references the notification table for the type of the notification and complete the missing parts with the fields startdate, enddate and task_name. The index_update_date is for the trigger function which keeps the last update time (insert or update) of the row. SOFTWARE REQUIREMENTS SPECIFICATIONS 5. REFERENCES [1] IEEE Guide for Software Requirements Specifications," in IEEE Std 830-1984 ,pp.1-26, Feb. 10 1984, doi: 10.1109/IEEESTD.1984.119205, URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=278253&isnumber=6883&tag=1 [2] Nodejs Api Documentation URL : https://nodejs.org/en/docs/ [3] Pm2 Product Documentation URL: http://pm2.keymetrics.io/docs/usage/cluster-mode/ [4] Memcached Product Documentation URL: https://github.com/memcached/memcached/wiki [5] Nagios Product Documentation URL: https://www.nagios.org/about/ [6] Centos Product Documentation URL: https://www.centos.org/about/ [7] Apache Spark Documentation URL: http://spark.apache.org/docs/latest/