A Formal and Tooled Framework for Managing Everything as a Service www.occiware.org Deliverable 5.3.1 Big Data Use Case Requirement Specifications OCCIware is a project funded by the frensh FSN (Fonds national pour la Société Numérique), and supported by five clusters: Systematic, Minalogic, PICOM, Images & Réseaux et Solutions Communicantes Sécurisées. 2 OCCIware 3 OCCIware Project Title : OCCIware Deliverable Number : 5.3.1 Deliverable Title : Big Data Use Case Requirement Specifications Deliverable Nature : Report Dissemination Level : Public License : Creative Commons Attribution 3.0 License Version : 1.0 Publication Date : 24 July 2015 Sub-Project : WP5 Editor(s) : Iyad Alshabani Author(s) : Iyad Alshabani, Michael Benguigui Reviewer(s) : Pierre-Yves Gibello, Philippe Merle Executive Summary This document describes the Big Data as a Service (BDaaS) use case that will demonstrate the usage of OCCIware platform. The use case is a portfolio hedge and diversification application that uses the Big Data technology Spark. This use case has some generic and specific requirements that are explained in this document. Keywords OCCIware, BDaaS, Scalability, Elasticity, Automated deployment, Monitoring Manager, Elasticity Manager, Roboconf, ProActive. 4 OCCIware Change control Version Changes Author(s) 0.1 Table of Content Iyad Alshabani (AE) 0.2 Use Case Description Iyad Alshabani, Michael Benguigui (AE) 0.3 Requirements elicitation methodology Iyad Alshabani (AE) 0.4 More precisions in the use case description Michael Benguigui (AE) 0.5 Review Pierre-Yves Gibello (LNG) 1.0 Consolidation and fill missing parts Iyad Alshabani (AE) Document review Review Date Ver. Reviewers Comments Outline 04/15/2015 0.1 Iyad Alshabani Draft 06/23/2015 0.5 Iyad Alshabani Review 07/03/2015 0.5 Pierre-Yves Gibello Final 07/21/2015 1.0 Philippe MERLE 5 OCCIware Glossary, Acronyms and Abbreviations Item Description AcR Accuracy Requirement AvR Availability Requirement AwR & AdR Awareness & Adaptability Requirement COS Comité d'Orientation Stratégique DCaaS Data Center as a Service ER Elasticity Requirement FR Functional Requirement IR Interoperability Requirement OCCI Open Cloud Computing Interface (standard OGF) OGF Open Grid Forum OSS Open Source Software PR Performance Requirement RAM Random Access Memory RR Reliability Requirement ScR Scalability Requirement SP Sub-project SR Security (& privacy and trust) Requirement UC Use Case VM Virtual Machine 6 OCCIware Table of content 1. Introduction ........................................................................................................................................ 7 2. Use Case Description ......................................................................................................................... 8 Portfolio hedge and diversification ........................................................................................................ 8 Processing Chain:............................................................................................................................... 8 Spark / Spark Streaming jobs ............................................................................................................. 9 Spark Cluster .................................................................................................................................... 10 ProActive Scheduler and Resource Manager .................................................................................. 11 Installation procedure........................................................................................................................... 11 3. Requirements elicitation methodology ............................................................................................ 11 3.1. Requirements taxonomy ............................................................................................................ 12 3.2. Requirements terminology ........................................................................................................ 13 4. General and Technical Requirements .............................................................................................. 13 5. Conclusion ....................................................................................................................................... 17 7 OCCIware 1. Introduction This deliverable gives an identification and collection of general and technical requirements for the use case Big Data (see Section 4), as well as requirements that are directly linked to the ActiveEon’s use case that is going to be developed in this project. Indeed, while use case is domain-specific, they are of great help to elicit requirements that have a more global reach to the OCCIware platform as a whole. This requirement elicitation is structured around a precise methodology that defines a requirement taxonomy and terminology. Both topics are covered by Section 3. The taxonomy is used to classify general, technical and use case-related requirements into relevant categories (functional, performance, availability, accuracy, security, scalability, interoperability, awareness & adaptability and reliability) and is used across the document. These categories have been tailored to match the general purposes identified within the project. Moreover, the requirements terminology, used for the definition of each requirement in the document, helps specify clear and non-ambiguous requirements. The terminology is based on the widely used IETF RFC 21191. The Big Data use case of ActiveEon deals with the deployment and orchestration of multiple Spark jobs (clusters). The ProActive Resource Manager is then allocating dynamic resources for all the processing chain of the multiple Spark jobs. 1 http://www.ietf.org/rfc/rfc2119.txt 8 OCCIware 2. Use Case Description Portfolio hedge and diversification In order to design a financial tool to hedge portfolio and diversify risks, we describe the following specific spark jobs orchestrated by ProActive Scheduler. Since we need to process huge amounts of asset prices, and achieve streaming linear regressions and streaming k-means to keep models up to date, Spark is definitely the right technology. Firstly, MLlib offers functionalities for streaming data mining. Secondly, Spark affords to play with advanced map-reduce instructions on RDD, the distributed representation of data in Spark. Processing Chain: The main spark jobs are described as follows: - The stock quotes are retrieved (Yahoo, ...) and possibly preprocessed in Spark. - A first Spark streaming job keeps up to date the multiple regressions between stocks prices (Stochastic Gradient descent in MLlib). The resulting linear regressions can state as a first support tool for portfolio managers. - In parallel, a second spark job is in charge of computing the correlation matrix. In order to keep up to date models requiring correlations, this latter should be estimated quite fast (using Spark Streaming). - The second spark streaming job clusterizes correlations (kmeans in MLlib) to build a welldiversified portfolio. From the chosen correlated stocks (Choleski matrix), we compute (Monte Carlo method) the delta of the basket option and deduce the portfolio hedge. 9 OCCIware The different jobs are orchestrated using the ProActive Workflows and Scheduling. The ProActive Workflows will describe the processing chain between spark jobs and manage resources needed for these jobs. Spark / Spark Streaming jobs Each Spark job can be deployed in a cluster of nodes called Spark nodes. For a Spark job, a number of nodes is allocated before the beginning of the processing. The different jobs are not homogenous. Some of the jobs are Spark Streaming and some of them are Spark. The first blocks of the full streaming platform are organised as follows: The first streaming job queries data from Yahoo or Google (user choice), cleans data for the following jobs, and writes data in a shared directory. The batch duration is set by the user, and only successive distinct quotes are written (). Running in parallel, is constantly fed with formatted quotes from , and keeps up to date all the linear regressions models for any symbol pairs (GOOG-AAPL, AAPL-MSFT,...). The streaming MLlib library provides support for streaming regression models based on the stochastic gradient descent algorithm. Similarly to , streams data from the output directory and computes correlation coefficients (through MLlib) before writing the resulting correlation matrix in a shared directory. Once a correlation matrix is generated, builds the inherent heatmap with jheatchart, and displays it in a JFrame. 10 OCCIware Spark Cluster Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext object in your main program (called the driver program). Specifically, to run on a cluster, the SparkContext can connect to several types of cluster managers (either Spark’s own standalone cluster manager or Mesos/YARN), which allocate resources across applications. Once connected, Spark acquires executors on nodes in the cluster, which are processes that run computations and store data for your application. Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to the executors. Finally, SparkContext sends tasks for the executors to run. 11 OCCIware ProActive Scheduler and Resource Manager The ProActive Scheduler and Resource manager are deployed in a way that it will monitor each Spark Cluster which in its turn monitors its applications. The ProActive scheduler is used as a meta scheduler of the set of spark clusters. Indeed, each spark cluster has its own spark scheduler. ProActive then orchestrates the set of clusters and jobs. Installation procedure The OCCIware platform will be able to offer the Spark jobs and the whole workflow as a Service which means the automation of the spark installation as part of the whole workflow. We describe here briefly, the installation procedure of Spark on YARN. Refer to http://www.alexjf.net/blog/distributedsystems/hadoop-yarn-installation-definitive-guide/ to install YARN 1. Install Hadoop 1/2 : Download and unpack one of the hadoop distributions on all nodes http://apache.mirrors.spacedump.net/hadoop/common/stable/ 2. Install Hadoop 2/2 : Setup environmental values required by hadoop > export HADOOP_PREFIX="/home/alex/Programs/hadoop-2.2.0" 3. HDFS configuration 1/1 : Update hadoop/hdfs-site.xml and hadoop/core-site.xml 4. YARN configuration 1/1 : Update hadoop/yarn-site.xml. 5. Format the namenode 1/1 : >hdfs namenode -format 6. Start daemons 1/1 : >hadoop-daemon.sh start namenode >hadoop-daemon.sh start datanode >yarn-daemon.sh start resourcemanager >yarn-daemon.sh start nodemanager 7. Install Spark 1/2 : Download one of the spark binary distributions. For details concerning the yarn configuration, please refer to https://spark.apache.org/docs/latest/configuration.html. 8. Install Spark 2/2 : Check if HADOOP_CONF_DIR or YARN_CONF_DIR points to the folder (client side) containing configuration files. 9. Start Spark Application 1/1 ./bin/spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn-cluster \ --num-executors 3 \ --driver-memory 4g \ --executor-memory 2g \ --executor-cores 1 \ --queue thequeue \ lib/spark-examples*.jar \ 10 For more details concerning the installation procedures, refer to http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide/ https://spark.apache.org/docs/latest/running-on-yarn.html 3. Requirements elicitation methodology A detailed requirements analysis was used to identify the specifications and needed functionalities for Big Data Use Case described in the previous section and for the system as a whole. These requirements specifications address both functional and non-functional concerns for the assessment 12 OCCIware of the solutions developed as part of the OCCIware project: a formal, model-driven platform to manage any cloud resource. The requirements process was devised to elicit high-level requirements from use case using a scenario-based approach. Using the scenario as a means of requirements elicitation enables the requirements analyst to ground questions in context through the main issues described in the scenario. 3.1. Requirements taxonomy The requirements were framed using a taxonomy of requirements generally used to improve understanding, analysis and testing of the solution, and are therefore used during requirements elicitation to write clearer more focused requirements. The types, along with example requirements, are presented in the table below. Requirements Taxonomy Name Description Example Functional [FR]: Something (behavior, function) that a service, application or system must support. e.g. deployment of new infrastructure resources must be done by a cloud manager Performance [PR]: The performance of a service, application or system enabling a functional requirement. e.g. maximum concurrent connections that an application could support Availability [AvR]: The minimum required levels of access that stakeholders have on information, services or systems. e.g. cloud hosting infrastructure should ensure a minimum availability of 99,95% Accuracy [AcR]: The error rate of a service or system, calculated on the basis of the expected results. e.g. load prediction algorithms should be very accurate Security (& privacy and trust) [SR]: The minimum levels of security, privacy and/or trust that a service, application or system should support. e.g. Front-end protection (application level) and flow isolation using VLANs (network level) Scalability [ScR]: The ability of a service, application or system to process more users' requests, operations or transactions in a given time interval. e.g. carpooling application shall be able to deal with huge variation in volume of requests depending on railways technical problems Interoperability [IR]: The ability of a service, application, or system to interact with other components or systems. e.g. application deployed on a multi-cloud context Awareness [AwR] & Adaptability [AdR]: The level of awareness and predictability that a service, application or system has over upcoming changes to its specifications or operating environments; as well as its ability to adapt to them. e.g. carpooling application should be aware of the potential increasing load and adapts itself to serve requests 13 OCCIware Reliability [RR]: The minimum levels of failure that a specific service, application or system should support. e.g. a bank transaction should be atomic to ensure 99,99% success Table 1 Requirements Taxonomy 3.2. Requirements terminology The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in IETF RFC 2119: ● MUST: This word, or the terms "REQUIRED" or "SHALL", mean that the definition is an absolute requirement of the specification. ● MUST NOT: This phrase, or the phrase "SHALL NOT", means that the definition is an absolute prohibition of the specification. ● SHOULD: This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course. ● SHOULD NOT: This phrase, or the phrase "NOT RECOMMENDED" mean that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label. ● MAY: This word, or the adjective "OPTIONAL", means that an item is truly optional. One vendor may choose to include the item because a particular marketplace requires it or because the vendor feels that it enhances the product while another vendor may omit the same item. 4. General and Technical Requirements The “platform” term, as used hereafter for simplicity and abstraction reasons in requirements definition, is in fact composed of the Big Data platform which is comprising ProActive Workflow and Scheduler with the Spark application, the IaaS that hosts the Spark nodes, the ProActive Scheduler and the ProActive Resource Manager. Moreover, the platform interacts with IaaS Automated Deployment component (PA Cloud Automation) and Roboconf-based component for automated application deployment. In this section, a series of requirements are discussed resulting from objectives and challenges dimensions. Those requirements address the aspects that are relevant for scalable and elastic applications in the Cloud. The primary focus here is to provide solutions for web applications to make them scalable and elastic according to customers’ inflow regardless of used infrastructure and technology. ID Name Description Functional FR001 Cloud-based platform The platform MUST be hosted in a private or public Cloud. FR002 Access control for resources The owner of the platform MUST have the access rights to deploy infrastructure resources 14 OCCIware FR003 Access control for security rules The owner of the platform SHOULD have access rights to deploy rules at the firewall level FR004 Unique Frontend Binding interface The platform MUST rely on one entry point to receive all customer requests. FR005 Cluster-oriented platform Both frontend and backend SHOULD be considered in cluster mode FR006 Automated Infrastructure deployment Infrastructure instances MAY be deployed automatically through ProActive solution, based on templates. FR007 Automated Application deployment Application such as web application or database instances MAY be deployed automatically through Roboconf. FR008 Frontend and Backend Monitoring Infrastructure (e.g. Virtual Machine) and applications (e.g. web application, databases) MUST be monitored continuously FR009 Monitoring Manager Monitoring information SHOULD be centralized and managed by the Resource Manager FR010 Elasticity Manager Elasticity information SHOULD be centralized and managed by the Resource Manager FR011 Monitoring Agent One Monitoring Agent MUST be related to one infrastructure component (e.g. Virtual Machine). A Monitoring Agent MUST monitor infrastructure resources and MAY monitor applications within the same infrastructure component and reports monitoring information to the Monitoring Manager. FR012 Elasticity Agent One Elasticity Agent MUST be related to one infrastructure component (e.g. Virtual Machine). An Elasticity Agent is specific, and MUST monitor applications within the same infrastructure component. Elasticity Agent reports infrastructure and application behaviors to the Elasticity Manager. FR013 Use of “pull” paradigm for monitoring information The Monitoring system MUST rely on the “pull” paradigm for gathering information. FR014 Use of “push” paradigm for elasticity events The Elasticity system MUST rely on the “push” paradigm for information reporting when is needed (e.g. need of additional infrastructure component). FR015 Platform features as Web services Main features of the platform SHOULD be deployed as Web services and cohabit with external Web services (e.g. Elasticity Manager, Roboconf, ProActive, etc.). FR016 Traceability The platform SHOULD log all service calls and actions. 15 OCCIware FR017 Unique view of resources The platform SHOULD give a unique view of all resources used for the use case FR018 Openness The platform MUST be connected to internet to be able to get data streams FR019 Data Visualization The platform SHOULD provide a workflow for visualizing the results of the analysis in the web frontal Performance PR001 Run of multiple and parallel workflows among several services The platform MUST be able to support the runtime of several parallel workflows. PR002 Minimize Processing time between jobs The platform SHOULD have the last job results as soon as the first job is retrieving quotes (streaming behaviour) Availability AvR001 Internal Services availability status Any service running on the platform SHOULD update its status (“started”, “work in progress”, “closed”, “error”) when one of its operations is invoked, through actions sent to the platform (e.g. deploy of new virtual machine or an additional MySQL DBN). AvR002 External Service availability status Customers SHOULD have information about the status of the global service (web application). Three cases: Operational Service Status, Maintenance Service Status or Degraded Service Status. Security SR001 External DDoS protection The platform SHOULD be protected with a DDoS solution at the entry point of the platform and on each infrastructure component (e.g. fail2ban). SR002 Remote access to the platform Remote access to the platform infrastructure (e.g. ssh) SHOULD rely on IP whitelist. SR003 Confidentiality of data Interactions between platform and consumers SHALL be secured to guarantee confidentiality of exchanged sensible data. SR004 Confidentiality of transaction Interactions between web application and consumers payments MUST be secured. SR005 Integrity Exchanges between component and services SHALL be secured to avoid any corruption of data. Scalability 16 OCCIware ScR001 Cloud infrastructure large scale support The hosting cloud MUST be able to allow resource provisioning (e.g. computing, storage, etc.) of large scale platforms ScR002 Platform large scale support The platform MUST be built at large scale in order to combine many infrastructure sources with the goal of connecting each component and deliver the global service → the platform will revolve around a Cloud architecture. ScR003 Large scale connection support The number of simultaneous connection could be very large. As such, the platform MUST have encompassing large-scale connection support. This is illustrated by number of consumers connected to the web application. ScR004 Highly distributed services support The services invoked during large-scale connection runtime are highly distributed. The platform MUST support such a context. ScR005 Service request scalability The platform MUST be able to deal with variations in number of service requests over time. ScR006 Data exchange scalability The platform MUST be able to deal with variations in amount of exchanged data over time. ScR007 Spark Specific Scalability The platform SHOULD deal with the variation of the number of spark nodes over the time Interoperability IR001 Highly heterogeneous services support The services invoked during workflows runtime are highly heterogeneous (different types). The platform MUST support this heterogeneity by allowing the communication between/to all services. IR002 Dynamic service discovery The platform MUST support the dynamic discovery of services at runtime. IR003 Human interaction The platform MUST allow for human interaction in the workflows in order to apply modifications at runtime. IR004 Runtime flexibility The coordination of all services in the system SHALL be highly flexible due to their heterogeneity. Awareness & Adaptability AdR001 Adaptation of platform to current flow The runtime context of applications in the platform is linked to current flow. As such, adaption recommendations to these flows SHOULD be computed by the platform. AdR002 Dynamic Configuration Substitution The platform SHOULD provide a way to substitute a Spark job dynamically 17 OCCIware AdR003 Dynamic reallocation of resources The platform SHOULD be able to provide dynamic configuration of the underlying resources Reliability RR001 Message queuing Exchanges between different components SHALL be performed in a secured way so that interactions may recover from temporary unavailability of services Table 2: General and Use Case-Based Requirements 5. Conclusion In this deliverable, we gave a description of the Big Data use case, which will be deployed as a service using the OCCIware platform. The use case is a hedge portfolio and diversification organized into a set of multiple Spark jobs orchestrated by the ProActive Workflows and Scheduling and its resources are managed by the ProActive Resource Manager. We also gave a description of the installation of the Spark framework which will be part of the use case deployment using the OCCIware platform. Indeed the OCCIware platform will be able to achieve the same process automatically from modeling using the OCCI languages and tools up to the deployment using Roboconf and ProActive Cloud Automation. Finally we give a list of generic and specific technical requirements of the use case that the platform should take into account.