Big Data Based Management for Smart Grids Atimad EI Khaouat*, Laila Benhlima* * Mohammadia shool of engineers Mohammed V university ofRabat Morocco [email protected], [email protected] Abstract-Information and communication technologies have crucial role with many researches improving existing electrical grid. With the emergence of the internet of things and the growing availability of connected devices such as smart meters and other sensors, we are facing huge amount of data about energy consumption, energy production and so on. In this context, smart grid data management and analytics using big data tools helps to manage this huge volume of collected data from smart devices installed in the grid in order to extract knowledge, define key performance indicators, forecast demand response behavior, ... This paper aims at presenting a solution for managing big data for smart grid to make it available for high level applications. We propose a global solution architecture with detail of its each component and explanation of data flow and analysis using big data process. Keywords-Smart Grid; Data Management; Analytics; Big Data; IT Solution; Architecture I. INTRODUCTTON Smart grid is an automation system, established by integrating large pool of sensors, smart meters, substations ... in the existing power grids system for controlling and monitoring it by information and communication technologies [1 ]. All the intelligent devices produce different and heterogeneous types of data: weather data, consumption data, energy production data... This explosion in data reflects the fact that a smart grid involves not just more detailed meter information, but a wide range of intelligent devices and data types that should be weil managed to take benefits from the smart grid; better understand customer behavior, detect outages, fraud or theft and more accurately forecast energy demand. It requires the establishment of complex treatments [2], due to: • The nature of the data: time series, steam data... • Their distributed nature and need of treatment at different scales, since it comes from different sources. • Real-time analytics for certain needs. Data management design in any context should optimize outcomes in two ways. First, it should extract clean, consistent and infonnation that drives targeted benefits for the business. And second having identified those benefits, it should mmUlllze the costs of infrastructure needed to obtain and process the data necessary to deliver these benefits. This paper focuses on IT solutions in the domain of smart grid data management and analytics using big data tools. The second section gives some related works in the field that are ACCENTURE, EDF and ITRON-TERADATA. The third section presents the proposed solution as a functional architecture that explains the flow of data from collection phase (Iow level of the architecture) to higher level where data management and data analytics are executed in order to make decisions, produce reports and develop future applications. We detail each component of the architecture. Finally, we conclude and give perspectives of OUf work. 11. RELATED WORKS In the last decades, smart grids have gained lot of interest from researchers and industrials. But for smart grid data management, we find only commercial solutions in the literature. Indeed, the prominent works in the smart grid data management and advanced analytics are done by ACCENTURE, ITRON-TERADATA, INFOSYS, IBM... In this paper we present three of them: ACCENTURE solution, ITRON-TERADATA system and EDF solution. A. EDF solution EDF IT solution is based on mature standard and a Metering Data Management System (MDMS), which is a software platform that acquires computing data from multiple sources and makes this data available after integration, synchronization and c1eaning. This platform offers: • A channel for hourly data available the next day on a web portal. • Real time channel for alerts. • Feeding data warehouse for historical data. EDF data management architecture built from fOUf layers [3]: • Data collection layer: a set of smart meters and programmable devices installed in the grid in order to collect data consumption. • Network layer: contain a filed area network, meter control and wide area network. This layer ensure 978-1-5090-5713-9/16/$31.00 ©2016 IEEEcommunication and data transfer in the grid between consumption and production end points and control center. • Meter data management: ensure data cleaning and classification before integration them. There are three types of data, and each one has a different treatment: o Events: processed by an event processor to feed outage management application. o Power consumption: Stored on Meter Data management Repository for future analysis. o Operational and system: processed by a management system for billing forecasts and demand response prediction, and also communicate with consumer web portal. • Applications layer: EDF applications presented in the solution architecture are: outage management, web portal, demand response, billing and distribution sizing. EDF solution doesn 't consider the big data that is generated by all the sensors. Moreover, it doesn't take into account weather and production data to make relevant analytics and develop applications corresponding to future needs. B. ACCENTURE solution ACCENTURE proposes a system to manage five distinct smart grid data classes: Operational data, Non-operational data, Meter usage data, Event message data and Metadata [4]. Each class has its own properties that should be treated and managed in different ways based on its source, characteristics and applicability. ACCENTURE data analytics architecture aims at addressing the following challenges [5]: • Matching the data acquisition infrastructure to the required outcomes. • Learning to apply new tools, standards and architectures to manage grid data at scale. • Transforming processes throughout the business to take advantage of smart grid technology. • Managing master data to enable the benefits from smart grid capabilities. ACCENTURE solution proposes to extract knowledge through the following components: Signal analytics: used substation waveforms data, line sensor waveforms data... to determine key indicators such as electrical distance domain. Events analytics: by detection, classification and filtering events data. State analytics: applied on a stream data to create real time information like: real time electrical state, real time grid topology ... Operational analytics: helps to define system performances, asset health and load forecast. Customer analytics: applied on consumption and client data in order to construct demand profiles, demand response behavior and customer segmentation. To date, Accenture has catalogued more than 200 smart grid analytics and several classes oftechnical analytics such as electrical and device states (including traditional, renewable and distributed energy resources), power quality, Customer behavior (especially in terms of demand response)... ACCENTURE solution needs to integrate more big data tools to improve not only analysis forecast and decision making, but all the power system operation. In addition, the solution focuses on analytical aspects without explaining processes of data treatment, management and storage. C. ITRON-TERADATA solution: TERADATA, which is among the leading companies in big data infrastructure, and ITRON have developed an Active Smart grid Analytics solution. ITRON-TERADATA architecture is based on an Active Smart grid Analytics (ASA). And the Active Smart Grid Data Warehouse provides an architectural approach that helps the application of real-time analytics and enables smarter, faster decisions. According to this architecture, the Active Smart Grid Data Warehouse must accommodate simultaneous loading of large data volumes from multiple sources: meters, sensors and control devices and at the same time perform complex analytics such as: demand response, load forecasts, customer behavior, future overloads... These analytics are called active because they analyze and correlate data from all related systems involved in the smart grid as it arrives, triggering actions, and participating in workflows [6]. A key component of ASA, the Utility Logical Data Model (uLDM), is a comprehensive model for analyzing smart grid information. The three presented systems offer solutions for smart grid management but they are commercial solutions, and just a little detail, and sometimes no detail, about the data management architecture components is available. Moreover, in our solution, we aim at processing not only stream data that concern real data to detect anomalies and ensure exact forecasting, but also at providing the storage and the models needed for managing different kinds of data and analyzing them. III. PROPOSED SOLUTION In this section we propose and detail different components of our smart grid data management architecture. Our solution, is on the one side able to manage various and large amount of data collected from different sources thanks to the big data tools, and on the other side provides supports for smart grid applications such as demand prediction through analyticsprocessing. Adding to that, it processes both stream data that have to be processed for real-time applications and data that can be stored and accessed by request processing components by applications such as billing, reporting... A. Architecture Figure 1 s h ows the proposed architecture, where four principal levels are presented: First level: is the low one and is responsible of collecting information from different consumption or production point and which ineludes smart devices such as: ~ AMR: is the technology of automatically coUecting consumption, diagnostic, and status data from meter devices [7]. ~ Sensors: are devices that respond to a physical stimulus heat, light, sound, pressure, magnetism motion, etc. [8], and convert that into electrical signals they can collect different types of information like weather, temperature... ~ Smart meter is an electronic device that records consumption of electric energy, and communicates that information to the utility for monitoring and billing [9]. ~ Substation: is a component of an electrical generation, transmission, and distribution system. They transform voltage from high to low, or the reverse, they are connected to SCADA for remote monitoring. ~ Data servers for data (weather, events, ... ) Second level: is communication support level and network layer, responsible of data transportation and circulation in the smart grid. Third level: It is the most important level, because it is responsible of various data processing: e1eaning, storage, management and analysis. All these operations have to transform data into actionable insights decision making. Fourth level: represents smart grid applications. In the above architecture, applications planned are: consumption prediction, monitoring and production forecasting. B. Data management and analytics The data management system supports the tasks of : data storage, modeling and executing of different data types; training forecasting models, which requires retrieving data and designing features ; retrieving data and scoring forecasting models at runtime; interfaces to e1ient applications (e.g. consumption prediction, monitoring, production prediction...). Figure 1 shows that collected data are first fIltered and e1eaned to take out u seless information. Then, it is sent to NoSQL [10] database for storage (flow 2). NoSQL (Not Only SQL) databases are one of big data technologies to store large amount of data, wh ere the records don't have the same structure such as in traditional relational databases. NoSQL database contains different types of data collected by intelligent devices presented on collection level of the architecture, such as: consumption data, production data, weather data, events, e1ient data, meter data and billing data. This data is distributed in multiple nodes in intelligent manner so as to be rapidly accessed. It is also replicated in order to ensure its availability. Modeling process (flow 3) is responsible of feature selection, in order to select relevant features and then create models that will be stored on models storage, (flow 4) and that will be used by smart grid applications (flow 5). Flow number 2' represents stream data, it concems online analytics. In this case, communication with applications is directly done after cleaning and fIltering. For example, Anomaly tracking application uses this type of data combined with the appropriate model from the models storage. Applications such as consumption forecast use request processing, because in this case, it is the application that makes demands for request processing while using models (flows number 6,6',7, 7'). All this treatment is done independently offlow number 2'. Analytical processes presented below, applied to heterogeneous kinds of data are not only for creating models that will be used by smart grid applications, but also to create dashboards, determine key performances of the grid and also to forecast demand response behavior. Our solution is based on big data stack which enables large data sets processing using the parallel computing paradigm. We use this potential for query processing, but a l so in data preparation such as c1eaning, in data preprocessing for feature selection and also for data training in the modeling process. This will lead us to provide low time response, which is necessary for some applications such as energy prediction and system monitoring.1-------------------------------------------------------------------------------------\ I Ene rgy prediction M onito r ing A nom aly t racking Ene rgy oonsu mption t racking II I I I , A pplicatio n level I I , , ------ --~: = = ~---------~: ===~---------~:==~ ~---------~=!"~-----------: == ~: --------~=== ~---- ;.,.". ... -------- r---~! ~------ · --------6.- --'6( - - - - --"", ,- , , , \ , 5 Mode l Storage M ode ling !4-_ 4--iI Feat ure !>election I I M o del training I 2 Cleaning Filt er ing "I 3 ( NOSQL Sto r.age ~.---/ Pro duction dat a M et er dat a Consumpt ion data Cli ent dat a Reqllest prooessing I Request analy ze r .-!- I Re q u es! o pt i mi ze r 7' ---+. I Dat a a c ce ss I , I I , , \ , , , , , Data manage m·ent leve l , , -' , ~ ~ _ _________________________________ t '" ~ _________________ __________________________________ _ ~J;/ ;=-=-=:..::..=:...=..:"-=-'=-=-=-=-="""";.;;;.;;;;.;;.;;;..=..==-=-==-=;.- - - - -', Comm unicat ion lev el W ide A r ea Ne twürk , Figure I.Proposed Solution Architecture CONCLUSTON Wehave proposed agiobai functional smart grid architecture, with detail of different levels and components, we focused on the two higher data levels responsible of data management and analytics using big data infrastructure. The system we're working on can be used as a service for various high level applications. We implemented the filtering process using flume [11] and scoop [12] which are big data tools. We are working on NOSQL storage. Our future work is focused on energy prediction as a use case for our smart grid data management. REFERENCES [I] T. Popovic . M. Kezunovic' B. Krstajic, "Smart grid data analytics for digital protective relay event", June 2013. [2] F. Fusco, V. Fischer, V. Lonij, P. Pompey, J. Fiot, B. Chen, Y. Gkoufas, M. Sin, "Data Management System for Energy Analytics and its Application to Forecasting", 2016. [3] Marie-Luce PICARO, EDF R&D," Donnees massives pour les smartgrids". [4] ACCENTURE, "Achieving High Performance in Smart Grid Data Management. [5] ACCENTURE, "unlocking the value ofanalytics", 2014. [6] Itron White Paper, "Active Smart Grid Analytics™ Maximizing Your Smart Grid Invest". [7] H A. Mahmood, M.Aamir, M.Trfan, "Design and Tmplementation of AMR Smart Grid System" , IEEE Electrical Power & Ene rgy Conference, 2008. [8] P S.Clara, "Sensor Devices and Sensor Network Applications for the Smart Grid/Smart Cities", CA, USA, March 2012. [9] S. Shekara Sreenadh Reddy Depuru, L. Wang, V. Devabhaktuni "Smart meters for power grid: Challenges, issues, advantages and status", February 2011 . [10] R. Kumar, B. Bhushan Parashar, S . Gupta, Y. Shanna, N. Gupta, "Apache Hadoop, NoSQL and NewSQL Solutions of Big Data", 2014. [I I] Flume 1.6.0 User Guide- Apache Software Foundation https://f1ume.apache.org!FlumeUserGuide.html. [12] Scoop User Guide v1.4.4, Apache Software Foundation. https://sqoop.apache.org!docs/I.4 .4/SqoopUserGuideohtmI.