© Copyright IBM Corporation, 2014 IBM System x reference architecture for Hadoop: MapR May 2014 Beth L Hoffman and Billy Robinson (IBM) Andy Lerner and James Sun (MapR Technologies) IBM System x reference architecture for Hadoop: MapR Table of contents Introduction .............................................................................................................................. 1 Business problem and business value ................................................................................... 1 Business problem .................................................................................................................................... 1 Business value ......................................................................................................................................... 2 Requirements ........................................................................................................................... 2 Functional requirements .......................................................................................................................... 2 Nonfunctional requirements ..................................................................................................................... 4 Architectural overview ............................................................................................................. 5 Component model .................................................................................................................... 6 Operational model .................................................................................................................. 10 Cluster nodes ......................................................................................................................................... 10 Networking ............................................................................................................................................. 12 Predefined configurations ...................................................................................................................... 16 Number of nodes for MapR management services ............................................................................... 18 Deployment diagram .............................................................................................................................. 19 Deployment considerations ................................................................................................... 21 Systems management ........................................................................................................................... 22 Storage considerations .......................................................................................................................... 22 Performance considerations .................................................................................................................. 23 Scaling considerations ........................................................................................................................... 24 High availability considerations .............................................................................................................. 25 Migration considerations ........................................................................................................................ 28 Appendix 1: Bill of material ................................................................................................... 29 Node....................................................................................................................................................... 29 Administration / Management network switch ....................................................................................... 30 Data network switch ............................................................................................................................... 31 Rack ....................................................................................................................................................... 31 Cables .................................................................................................................................................... 31 Resources ............................................................................................................................... 32 Trademarks and special notices ........................................................................................... 33 IBM System x reference architecture for Hadoop: MapR 1 Introduction This reference architecture is a predefined and optimized hardware infrastructure for MapR M7 Edition, a distribution of Hadoop with value added capabilities produced by MapR Technologies. The reference architecture provides a predefined hardware configuration for implementing MapR M7 Edition on IBM® System x® hardware and IBM networking products. MapR M7 Edition is a complete Hadoop distribution supporting MapReduce, HBase, and Hadoop ecosystem workloads. MapReduce is a core component of Hadoop that provides an off-line, batch-oriented framework for high-throughput data access and distributed computation. The predefined configuration provides a baseline configuration for a MapR cluster which can be modified based on the specific customer requirements, such as lower cost, improved performance, and increased reliability. The intended audience of this document is IT professionals, technical architects, sales engineers, and consultants to assist in planning, designing and implementing the MapR solution on IBM System x. It is assumed that you have existing knowledge of Apache Hadoop components and capabilities. The “Resources” section provides links to Hadoop information. Business problem and business value This section describes the business problem associated with big data environments and the value that the MapR solution on IBM System x offers. Business problem Every day, around 2.5 quintillion bytes of data is created—so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone global positioning system (GPS) signals to name a few. This data is big data. Big data spans three dimensions: • Volume – Big data comes in one size: large. Enterprises are awash with data, easily amassing terabytes and even petabytes of information. • Velocity – Often time-sensitive, big data must be used as it is streaming in to the enterprise in order to maximize its value to the business. • Variety – Big data extends beyond structured data, including unstructured data of all varieties: text, audio, video, click streams, log files and more. Big data is more than a challenge; it is an opportunity to find insight in new and emerging types of data to make your business more agile, and to answer questions that, in the past, were beyond reach. Until now, there was no practical way to harvest this opportunity. Today, MapR uses the latest big data technologies, such as the massive map-reduce scale-out capabilities of Hadoop to open the door to a world of possibilities. IBM System x reference architecture for Hadoop: MapR 2 Business value MapR provides a Hadoop-based big data solution that is easy to manage, dependable, and fast. MapR eliminates the complexity of setting up and managing Hadoop. MapR provides alerts, alarms, and insights through an advanced graphical interface. MapR Heatmap provides a clear view of cluster health and performance, and MapR volumes simplify data security, retention, placement, and quota management. MapR provides Direct Access NFS. This allows users to mount the entire Hadoop cluster as an NFS volume that simplifies how an application can write to and read from a Hadoop cluster. MapR provides a high level of availability including support for rolling upgrades, self-healing and automated stateful failover. MapR also provides dependable data storage with full data protection and business continuity features including snapshots and mirroring. MapR’s enhanced security features ensure strong user authentication and authorization. Data confidentiality and integrity is also enforced whether it is in motion or at rest in Hadoop cluster. Unique MapR functions improve MapReduce throughput. MapR deployed on IBM System x servers with IBM networking components provides superior performance, reliability, and scalability. The reference architecture supports entry through high-end configurations and the ability to easily scale as the use of big data grows. A choice of infrastructure components provides flexibility in meeting varying big data analytics requirements. Requirements The functional and nonfunctional requirements for the MapR reference architecture are described in this section. Functional requirements The key functional requirements for a big data solution include: • Support for a variety of application types, including batch and real-time analytics • Support for industry-standard interfaces, so that existing applications can work with MapR • Support for real-time streaming and processing of data • Support for a variety of data types and databases • Support for a variety of client interfaces • Support for large volumes of data MapR supports mission-critical and real-time big data analytics across different industries. MapR is used across financial services, retail, media, healthcare, manufacturing, telecommunications and government organizations and by leading Fortune 100 and Web 2.0 companies. The MapR platform for big data can be used for a variety of use cases from batch applications that use MapReduce with data source such as clickstreams to real-time applications that use sensor data. The MapR platform for Apache Hadoop integrates a growing set of functions, including MapReduce, file-based applications, interactive SQL, NoSQL databases, search and discovery, and real-time stream processing. With MapR, data does not need to be moved to specialized silos for processing; data can be processed in place. IBM System x reference architecture for Hadoop: MapR 3 Figure 1 shows MapR key capabilities that meet the functional requirements stated earlier. Figure 1: MapR key capabilities MapReduce MapR provides the required performance for MapReduce operations on Hadoop and publishes performance benchmarking results. The MapR architecture is built in C/C++ and harnesses distributed metadata with an optimized shuffle process, enabling MapR to deliver consistent high performance. File-based applications MapR is a 100% Portable Operating System Interface (POSIX) compliant system that fully supports random read-write operations. By supporting an industry-standard Network File System (NFS), users can mount a MapR cluster and run any file-based application, written in any language, directly on the data residing in the cluster. All standard tools in the enterprise including browsers, UNIX® tools, spreadsheets, and scripts can access the cluster directly without any modifications. SQL There are a number of applications that support SQL access against data contained in MapR including Hive, Hadapt and others. MapR is also leading the development of Apache Drill that brings American National Standards Institute (ANSI) SQL capabilities to Hadoop. Database MapR has removed the trade-offs that organizations face when looking to deploy a NoSQL solution. Specifically, MapR delivers ease of use, dependability, and performance advantages for HBase applications. MapR provides scalability, strong consistency, reliability, and continuous low latency with an architecture that does not require compactions or background consistency checks. Search MapR integrates enterprise-grade search. On a single platform, customers can now perform predictive analytics, full search and discovery; and conduct advanced database operations. The MapR enterprise IBM System x reference architecture for Hadoop: MapR 4 grade search capability works directly on Hadoop data but can also index and search standard files without having to perform any conversion or transformation. Stream processing MapR provides a simplified architecture for real-time stream computational engines such as Storm. Streaming data feeds can be written directly to the MapR platform for Hadoop for long-term storage and MapReduce processing. Nonfunctional requirements Customers require their big data solution to be easy, dependable, and fast. Here is a list of the key nonfunctional requirements: Easy • Ease of development • Easy management at scale • Advanced job management • Multitenancy Dependable • Data protection with snapshot and mirroring • Automated self-healing • Insight into software/hardware health and issues • High availability (HA) and business continuity (99.999% uptime) Fast • Superior performance • Scalability Secure • Strong authentication and authorization • Kerberos support • Data confidentiality and integrity The MapR solution provides features and capabilities that fulfill the nonfunctional requirements requested by customers. The blue boxes in Figure 1 illustrate the key MapR capabilities that meet the nonfunctional requirements. MapR provides dependability, ease-of-use, and speed to Hadoop, NoSQL, database, and streaming applications in one unified big data platform. This full range of applications and data sources benefit from MapR enterprise-grade platform and unified architecture for files and tables. The MapR platform provides high availability, data protection and disaster recovery to support mission-critical applications. The MapR platform also makes it easier to use existing applications and solutions by supporting industry-standard interfaces, such as NFS. To support a diverse set of applications and users, MapR also provides multitenancy features and volume support. These features include support for heterogeneous hardware within a cluster and data and job placement control IBM System x reference architecture for Hadoop: MapR 5 so that applications can be selectively ran in a cluster to take advantage of faster processors or solid-state drives (SSDs). The subsequent sections in this document describe the reference architecture that meets the business needs and the functional and nonfunctional requirements. Architectural overview Figure 2 shows the main features of the MapR IBM System x reference architecture. Users can log in to the MapR client from outside the firewall using Secure Shell (SSH) on port 22 to access the MapR solution in the corporate network. MapR provides several interfaces that allow administrators and users to perform administration and data functions depending on their roles and access level. Hadoop application programming interfaces (APIs) can be used to access data. MapR APIs can be used for cluster management and monitoring. MapR data services, management services, and other services run on the nodes in cluster. Storage is a component of each node in the cluster. Data can be ingested into the MapR Hadoop storage either through the Hadoop APIs or NFS depending on the needs of the customer. Figure 2: MapR architecture overview IBM System x reference architecture for Hadoop: MapR 6 Component model MapR is a complete distribution that includes Spark, Shark, HBase, Pig, Hive, Mahout, Cascading, Sqoop, Flume, Storm and more. MapR distribution is 100% API-compatible with Hadoop (MapReduce, HDFS and HBase). The M5 Edition includes advanced high availability and data protection features, such as JobTracker HA, No NameNode HA, snapshots, and mirroring. The M7 Edition supports enterprise mission-critical deployments. MapR M7 includes the following features: • MapR Control System and Heatmap: The MapR Control System (MCS) provides full visibility into cluster resources and activity. The MapR MCS dashboard includes the MapR Heatmap that provides visual insight into node health, service status, and resource utilization, organized by the cluster topology (for example, data centers and racks). Designed to manage large clusters with thousands of nodes, the MapR Heatmap shows the health of the entire cluster at a glance. Filters and group actions are also provided to select specific components and perform administrative actions directly because the number of nodes, files, and volumes can be very high. The Heatmap interfaces are designed for managing the smallest to the largest clusters, but also include command-line interface (CLI) and Representational State Transfer (REST) access as well. • MapR No NameNode high availability: MapR Hadoop distribution is unique because it was designed for high availability. MapR is the only Hadoop distribution designed with no single point of failure. Other Hadoop distributions have a single primary NameNode and when the NameNode goes down, the entire cluster becomes unavailable until the NameNode is restarted. In those cases where other distributions are configured with multiple NameNodes, the entire cluster becomes unavailable during the failover to a secondary NameNode. With MapR, file metadata is replicated and distributed, so that there is no data loss or downtime even in the face of multiple disk or node failures. • MapR JobTracker high availability: The MapR JobTracker HA improves recovery time objectives and provides for a self-healing cluster. Upon failure, the MapR JobTracker automatically restarts on another node in the cluster. TaskTrackers can automatically pause and then reconnect to the new JobTracker. Any currently running jobs or tasks continue without losing any progress or failing. • MapR storage services: MapR stores data in a distributed shared system that eliminates contention and the expense from data transport and retrieval. Automatic, transparent client-side compression reduces network resources and reduces footprint on disk, while direct block device I/O provides throughput at hardware speed without additional resources. As an additional performance boost, with MapR, you can read files while they are still being written. MapR No NameNode architecture scales linearly with the number of nodes, providing unlimited file support. You need to add nodes to increase the number of files supported to more than a trillion files containing over 1000 PB of data. • MapR Direct Access NFS: MapR Direct Access NFS makes Hadoop radically easier and less expensive to use by letting the user mount the Hadoop file system from a standard NFS client. Unlike the write-once system found in other Hadoop distributions, MapR allows files to be modified and overwritten, and enables multiple concurrent reads and IBM System x reference architecture for Hadoop: MapR 7 writes on any file. Users can browse files, automatically open associated applications with a mouse click, or drag files and directories into and out of the cluster. Additionally, standard command-line tools and UNIX applications and utilities (such as grep, tar, sort, and tail) can be used directly on data in the cluster. With other Hadoop distributions, the user must copy the data out of the cluster in order to use standard tools. • MapR job metrics: The MapR job metrics service provides in-depth access to the performance statistics of your cluster and the jobs that run on it. With MapR job metrics, you can examine trends in resource use, diagnose unusual node behavior, or examine how changes in your job configuration affect the job's execution. • MapR snapshots: MapR provides snapshots that are atomic and transactionally consistent. MapR snapshots provide protection from user and application errors with flexible schedules to accommodate a range of recovery point objectives. MapR snapshots can be scheduled or performed on demand. Recovering from a snapshot is as easy as dragging the directory or files to the current directory. MapR snapshots offer high performance and space efficiency. No data is copied in order to create a snapshot. As a result, a snapshot of a petabyte volume can be performed in seconds. A snapshot operation does not have any impact on write performance because MapR uses redirecton-write to implement snapshots. All writes in MapR goes to new blocks on disk. This means that a snapshot needs to retain references to the old blocks and does not require copying data blocks. • MapR mirroring: MapR makes data protection easy and built-in. Going far beyond replication, MapR mirroring means that you can set policies around your recovery time objectives (RTO) and mirror your data automatically within your cluster, between clusters, (such as a production and a research cluster) or between sites. • MapR volumes: MapR volumes make cluster data both easy to access and easy to manage by grouping related files and directories into a single tree structure so they can be easily organized, managed, and secured. MapR volumes provide the ability to apply policies including the following: replication factor, scheduled mirroring, scheduled snapshots, data placement and topology control, quotas and usage tracking, administrative permissions. • MapR tables: MapR-FS enables you to create and manipulate tables in many of the same ways that you create and manipulate files in a standard UNIX file system. MapR tables are implemented directly within MapR-FS, yielding a familiar and open-standards API that provides a high-performance data store for tables. MapR-FS is written in C and optimized for performance. As a result, MapR-FS runs significantly faster than JVM-based Apache HBase. With M7, there are no region servers, additional processes, or any redundant layer between the application and the data residing in the cluster. M7’s zero administration approach includes automatic region splits and self-tuning with no downtime required for any operation including schema changes. MapR's implementation of the HBase API provides enterprise-grade high availability (HA), data protection, and disaster recovery features for tables on a distributed Hadoop cluster. MapR tables can be used as the underlying key-value store for Hive, or any other application requiring a high-performance, high-availability key-value data store. Because IBM System x reference architecture for Hadoop: MapR 8 MapR tables are API-compatible with HBase, many legacy HBase applications can continue to run without modification. These features are implemented by a set of MapR components as shown in the MapR component model in Figure 3. Figure 3: MapR component model MapR includes several open source projects many of which are shown in the Hadoop ecosystem box: • Apache Flume – A distributed, reliable, and highly available service for efficiently moving large amounts of data around a cluster. • Apache Spark – A fast and general engine for large-scale data processing. • Shark – An open source distributed SQL query engine for Hadoop data. It brings state-ofthe-art performance and advanced analytics to Hive users. • Apache Hadoop – A framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. • Apache HBase – The Hadoop database, a distributed, scalable, big data store. • Apache HCatalog – A table and storage management service for data created using Apache Hadoop. • Apache Hive – A data warehouse system for Hadoop that facilitates easy data summarization, ad hoc queries, and the analysis of large data sets stored in Hadoop compatible file systems. • Apache Mahout – A scalable machine learning library. • Apache Oozie – A workflow coordination manager • Apache Pig – A language and runtime for analyzing large data sets, consisting of a highlevel language for expressing data analysis programs and an infrastructure for evaluating those programs. IBM System x reference architecture for Hadoop: MapR 9 • Apache Sqoop – A tool designed for efficiently transferring bulk data between Apache Hadoop and structured data stores such as relational databases. • Apache Whirr – A set of libraries for running cloud services. • Apache ZooKeeper – A distributed service for maintaining configuration information, providing distributed synchronization, and providing group service. • Cascading – An application framework for Java™ developers to quickly and easily develop robust data analytics and data management applications on Apache Hadoop. MapR data services All cluster nodes can run data or worker services, which run the MapReduce tasks over a subset of the data. Cluster nodes also store the MapR File System (MapR-FS) data. In large clusters, some nodes may be dedicated to management services. Each node hosts the following data services: • MapR-FS - Provides distributed file services. • TaskTracker - Runs map and reduces tasks for MapReduce clusters. • HBase RegionServer (optional) - Maintains and serves table regions in an HBase cluster. MapR management services MapR management services can run on any node. In multi-rack configurations, these services should be on nodes spread across racks. Here is the list of MapR management services: • Container Location Data Base (CLDB) – Manages and maintains container location information and replication. • JobTracker – Is a Hadoop service that farms out MapReduce tasks to specific nodes in a MapReduce cluster. • ZooKeeper – Coordinates activity and keeps track of management services locations on the cluster. • HBase Master (optional) - Monitors all RegionServer instances in an HBase cluster and manages metadata changes. For high availability, more than one node can be configured to run HBase Master. In this case, only one HBase Master is active while additional HBase Masters are available to take over HBase management if the active HBase Master fails. Other optional MapR services MapR offers two other optional services that can be run on multiple nodes in the cluster: • NFS server (Gateway) – Provides NFS access to the distributed file system. The NFS server is often run on all nodes in the cluster to allow local mounting of the cluster file system from any node. • WebServer – Provides MapR Control System graphical user interface (GUI) and REST API. MapR provides support for many client interfaces, several of which were described in the architecture overview or feature list. Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC) can be used to access data in the MapR Hadoop cluster. A MapR CLI provides an additional way to manage the cluster and services. IBM System x reference architecture for Hadoop: MapR 10 Hadoop and the MapR solution are operating system independent. MapR supports many Linux® operating systems. Red Hat Linux and SUSE Linux are supported with the IBM System x reference architecture. The details about the versions can be found at: http://www.mapr.com/doc/display/MapR/Requirements+for+Installation Operational model This section describes the operational model for the MapR reference architecture. The operational model focuses on a cluster of nodes to support a given amount of data. In order to illustrate the operational model for different sized customer environments, four different models are provided for supporting different amounts of data. Throughout the document, these will be referred to as starter rack, half rack, full rack, and multirack configuration sizes. Note that the operational model for the multirack size is three times larger than the full rack size, and therefore, data amounts between them can be extrapolated by using different multiples of the full rack operational model. A predefined configuration for the MapR solution consists of cluster nodes, networking, power, and a rack. The predefined configurations can be implemented as is or modified based on specific customer requirements, such as lower cost, improved performance, and increased reliability. Key workload requirements such as the data growth rate, sizes of datasets, and data ingest patterns help in determining the proper configuration for a specific deployment. A best practice when designing a MapR cluster infrastructure is to conduct the necessary testing and proof of concepts against representative data and workloads to ensure that the proposed design achieves the necessary success criteria. Cluster nodes The MapR reference architecture is implemented on a set of nodes that make up a cluster. Nodes are implemented on System x3650 M4 BD servers with locally attached storage. MapR runs well on a homogenous server environment with no need for different hardware configurations for management and data services. Server nodes can run three different types of services: • Data (worker) services for storing and processing data • Management (control) services for coordinating and managing the cluster • Miscellaneous services (optional) for file and web serving Unlike other Hadoop distributions that require different server configurations for management nodes and data nodes, the MapR reference architecture requires only a single MapR node hardware configuration. Each node is then configured to run one or more of the mentioned services. In large clusters, a node could be dedicated to running only management services. Each node in the cluster is an IBM System x3650 M4 BD server. Each predefined node is made up of the components as shown in Table 1. The Intel® Xeon® processor E5-2650 v2 is recommended to provide sufficient performance. A minimum of 64 GB of memory is recommended for most MapReduce workloads with 128 GB or more recommended for HBase, Spark, and memory-intensive MapReduce workloads. A choice of 3 TB and 4 TB drives is suggested depending on the amount of data that needs to be stored. For IBM System x reference architecture for Hadoop: MapR 11 the hard disk drive (HDD) controller, JBOD (just a bunch of disks) is the best choice for a MapR cluster. It provides excellent performance and, when combined with the Hadoop default of 3x data replication, also provides significant protection against data loss. The use of Redundant Array of Independent Disks (RAID) with data disks is discouraged with MapR. MapR provides an automated way to set up and manage storage pools. (RAID can be used to mirror the OS, which is described in a later section.) Nodes can be customized according to client needs. Component Predefined configuration System System x3650 M4 BD Processor 2 x Intel Xeon processor E5-2650 v2 2.6GHz 8-core Memory - base 64 GB – 8 x 8GB 1866MHz RDIMM (minimum) Disk (OS) a 3 TB drives: 2 x 3TB NL SATA 3.5 inch or 4 TB drives: 2 x 4TB NL SATA 3.5 inch Disk (data) b 3 TB drives: 12 x 3TB NL SATA 3.5 inch (36 TB Total) or 4 TB drives: 12 x 4TB NL SATA 3.5 inch (48 TB Total) HDD controller 6Gb JBOD controller Hardware storage protection None (JBOD). By default, MapR maintains a total of three copies of data stored within the cluster. The copies are distributed across data servers and racks for fault recovery. Hardware management network adapter Integrated 1GBaseT Adapter Data network adapter Mellanox ConnectX-3 EN Dual-port SFP+ 10GbE Adapter Table 1: Node predefined configuration • OS drives may be of a smaller size than the data drives. OS drives are used for the operating system, the MapR application, and software monitoring. A separate RAID controller can be configured for the two OS drives. • Data drives should be of the same size, either all 3 TB or all 4 TB. The reference architecture recommends the storage-rich System x3650 M4 BD model for several reasons: • Storage capacity – The nodes are storage-rich. Each of the fourteen 3.5 inch drives has raw capacity up to 4 TB for a total of 56 TB per node and over a petabyte per rack. • Performance – This hardware supports the latest Intel Xeon processors based on the Intel Ivy Bridge microarchitecture. • Flexibility – Server hardware uses embedded storage resulting in simple scalability; just add more nodes. • More PCIe slots – Up to two PCIe slots are available. They can be used for network adapter redundancy and increased network throughput. • Better power efficiency – The server offers a common form factor (CFF) power supply unit (PSU) including Platinum 80 plus options. IBM System x reference architecture for Hadoop: MapR 12 Networking With respect to networking, the reference architecture specifies two networks: a data network and an administrative/management network. Figure 4 shows the networking configuration for MapR installed on a cluster with one rack. Data network The data network is a private cluster data interconnect among nodes used for data access, moving data across nodes within a cluster, and ingesting data into the MapR file system. The MapR cluster typically connects to the customer’s corporate data network. One top of rack switch is required for the data network used by MapR. Either a 1GbE or 10GbE switch can be used. A 1Gb Ethernet switch is sufficient for some workloads. A 10Gb Ethernet switch can provide extra I/O bandwidth for added performance. This rack switch for the data network has two physical, aggregated links connected to each of the nodes in the cluster. The data network is a private virtual local area network (VLAN) or subnet. The two Mellanox 10GbE ports of each node can be link aggregated to the recommended G8264 rack switch for better performance and improved HA. Alternatively, MapR can automatically take advantage of multiple data networks provided by multiple switches. Even when only one switch is used, because it has ports available and cluster nodes have multiple data network interfaces, it can be configured with multiple subnets to allow multiple network connections to each node. MapR automatically uses additional network connections to increase bandwidth. The recommended 10GbE switch is the IBM System Networking RackSwitch™ G8264. The recommended 1GbE switch is the IBM RackSwitch G8052. The enterprise-level IBM RackSwitch G8264 has the following characteristics: • Up to sixty-four 1Gb/10Gb SFP+ ports • Four 40 Gb QSFP+ ports • Support for 1.28 Tbps of non-blocking throughput • Energy-efficient, cost-effective design • Optimized for applications requiring high bandwidth and low latency The customer can optionally use edge nodes to exchange data within the cluster. Edge nodes can also help control access to the cluster and can be configured to control access to the data network, administration network, or both. IBM System x reference architecture for Hadoop: MapR 13 Hardware management network The hardware management network is a 1GbE network used for in-band OS administration and out-ofband hardware management. In-band administrative services, such as SSH or Virtual Network Computing (VNC), running on the host operating system allows administration of cluster nodes. Out-of-band management, through the integrated management modules II (IMM2) within the System x3650 M4 BD server, allows hardware-level management of cluster nodes, such as node deployment or basic input/output system (BIOS) configuration. Hadoop has no dependency on the integrated management modules (IMM2). Based on customer requirements, the administration links and management links can be segregated onto separate VLANs or subnets. The administrative/management network is typically connected directly to the customer’s administrative network. When the in-band administrative services on the host operating system are used, MapR should be configured to only use the data network. By default, MapR uses all the available network interfaces. The reference architecture requires one 1Gb Ethernet top of rack switch for the hardware management network. The two 10Gb uplinks between the G8052 and G8264 top of rack switches (as shown in Figure 4) are optional, however they can be used in customer environments that require faster routing and access over the administration network to all the nodes in the cluster. Administration users are also able to access all the nodes in the cluster through the customer admin network, as shown in Figure 5. This rack switch for the hardware management network is connected to each of the nodes in the cluster using two physical links (one for in-band OS administration and one link for out-of-band IMM2 hardware management). On the nodes, the administration link should connect to port 1 on the integrated 1GBaseT adapter and the management link should connect to the dedicated IMM2 port. The recommended switch is IBM RackSwitch G8052 with the following features: • Forty-eight 1 GbE RJ45 ports • Four standard 10 GbE SFP+ ports • Low 130W power rating and variable speed fans to reduce power consumption IBM System x reference architecture for Hadoop: MapR 14 Figure 4: MapR rack network configuration Multirack network The data network in the predefined reference architecture configuration consists of a single network topology. The single G8264 data network switch within each rack represents a single point of failure. Addressing this challenge can be achieved by building a redundant data network using an additional IBM RackSwitch G8264 top of rack switch per rack and appropriate additional IBM RackSwitch G8316 core switches per cluster. In this case, the second Mellanox 10GbE port can be connected to the second IBM RackSwitch G8264. Figure 5 shows how the network is configured when the MapR cluster is installed across more than one rack. The data network is connected across racks by two aggregated 40GbE uplinks from each rack’s G8264 switch to a core G8316 switch. A 40GbE switch is recommended for interconnecting the data network across multiple racks. IBM System Networking RackSwitch G8316 is the recommended switch. A best practice is to have redundant core switches for each rack to avoid a single point of failure. Within each rack, the G8052 switch can optionally be configured to have two uplinks to the G8264 switch to allow propagation of the administrative/management VLAN across cluster racks through the G8316 core switch. For large clusters, the IBM System Networking RackSwitch G8332 is recommended because it provides a better cost value per 40Gb port than the G8316. Figure 6 shows a large cluster example using the G8332 core switch. G8052 42x 1Gb ports used 2x 10Gb ports used G8264 (1 required 2nd for HA) 44 10Gb ports used 4 uplinks reserved for Scale out Mgmt port NodeMgmt services Data services Rack for MapR Solution 1Gb link Admin 1Gb link IMM 1Gb link 10Gb links Data 2x 10Gb Uplinks Customer Network Data Customer Network Admin 4x 40Gb Uplinks 10Gb links 1Gb link 40 GB Uplink to core switch Customer Data Network Data Network Administration/IMM Network Customer Administration Network G8052 42x 1Gb ports used 2x 10Gb ports used G8264 (1 required 2nd for HA) 44 10Gb ports used 4 uplinks reserved for Scale out Mgmt port G8264 (1 required 2nd for HA) 44 10Gb ports used 4 uplinks reserved for Scale out Mgmt port NodeMgmt services Data servicesN odeMgmt services Data servicesN odeMgmt services Data servicesN odeMgmt services Data services Rack for MapR Solution 1Gb link Admin 1Gb link IMM 1Gb link 10Gb links Data 2x 10Gb Uplinks Customer Network Data Customer Network Admin 4x 40Gb Uplinks 10Gb links 1Gb link 40 GB Uplink to core switch Customer Data Network Data Network Administration/IMM Network Customer Administration Network IBM System x reference architecture for Hadoop: MapR 15 Many other cross rack network configurations are possible and may be required to meet the needs of specific deployments or to address clusters larger than three racks. If the solution is initially implemented as a multirack solution, or if the system grows by adding additional racks, the nodes that provide management services should be distributed across racks to maximize fault tolerance. Figure 5: MapR cross rack network configuration IBM System x reference architecture for Hadoop: MapR 16 Predefined configurations Four predefined configurations for the MapR reference architecture are highlighted in Table 2. The table shows the amount of space for data and the number of nodes that each predefined configuration provides. Storage space is described in two ways; the total amount of raw storage space when using 3 TB or 4 TB drives (raw storage) and the amount of space for the data the customer has (available data space). Available data space assumes the use of Hadoop replication with three copies of the data, and 25% capacity reserved for efficient file system operation and to allow time to increase capacity if needed. Available data space might increase significantly with MapR automatic compression. The estimates in Table 2 does not include additional space freed up by using compression because compression rates can vary widely based on file contents. Starter rack Half rack Full rack Multirack Storage space using 3 TB drives Raw storage 108 TB 360 TB 720 TB 2,160 TB Available data space 27 TB 90 TB 180 TB 540 TB Storage space using 4 TB drives Raw storage 144 TB 480 TB 960 TB 2,880 TB Available data space 36 TB 120 TB 240 TB 720 TB Number of nodes Number of nodes 3 10 20 60 Table 2: Predefined configurations Figure 6: MapR large cluster configuration IBM System x reference architecture for Hadoop: MapR 17 The number of nodes that are required in the cluster to support these four predefined configurations are shown in Table 2. These are the estimates for highly available clusters. Three nodes are required to support a customer deployment that has 36 TB of data. Ten nodes are needed to support a customer deployment that has 120 TB of data, and so on. When estimating disk space within a MapR Hadoop cluster, the following must be considered: • For improved fault tolerance and improved performance, the MapR file system replicates data blocks across multiple cluster data nodes. By default, the file system maintains three replicas. • Compression ratio is an important consideration in estimating disk space and can vary greatly based on file contents. MapR provides automatic compression. Available data space might increase significantly with MapR automatic compression. If the customer’s data compression ratio is not available, assume a compression ratio of 2.5. • To ensure efficient file system operation and to allow time to add more storage capacity to the cluster if necessary, reserve 25% of the total capacity of the cluster. Assuming the default three replicas maintained by the MapR file system, the raw data disk space, and the required number of nodes can be estimated using the following equations: Total raw data disk space = (User data, uncompressed) * (4 / compression rate) Total required nodes = (Total raw data disk space) / (Raw data disk per node) You should also consider future growth requirements when estimating disk space. Based on these sizing principals, Table 3 demonstrates an example for a cluster that needs to store 250 TB of uncompressed user data. The example shows that the MapR cluster will need to have 400TB of raw disk to support 250 TB of uncompressed data. The 400 TB is for data storage and does not include OS disk space. Nine nodes, or nearly a half rack, would be required to support deployment of this size. Description Value Size of uncompressed user data 250 TB Compression rate 2.5x Size of compressed data 100 TB Storage multiplication factor 4 Raw data disk space needed for MapR cluster 400 TB - Storage needed for MapR-FS 3x replication 300 TB - Storage reserved for headroom 100 TB Raw data disk per node (with 4 TB drives) 48 TB Minimum number of nodes required 9 Table 3: Example of storage sizing with 4 TB drives IBM System x reference architecture for Hadoop: MapR 18 Number of nodes for MapR management services The nodes run MapR data services, management services, and other optional services. The number of nodes recommended for running management services and data services vary based on the size of the configuration. MapR is very flexible in its ability to use any node for any management service. Depending on workloads and HA requirements, multiple nodes could be dedicated to a single management service, multiple management services, or both management and data services. The number of nodes running management services can be customized based on specific workloads and HA requirements. Table 4 shows the number of nodes that should run management services depending on the size of cluster. Number of nodes Subset of nodes running management services Breakout of function < 40 3-5 Dedicated management nodes are not required. Management services run on nodes that also run data and optional services. • CLDB on two or three nodes. • JobTracker or HBase Master on two or three nodes. • ZooKeeper on three nodes (run ZooKeeper on an odd number of nodes). Reduce the number of task slots on servers running both data and management services to ensure the processor and memory resources are available to management services. WebServer and NFS server can also be run on nodes running management services. For faster failover recovery times, avoid running ZooKeeper and CLDB services on the same servers. 40 – 100 5-7 Dedicated management nodes are not required. Management services run on nodes that also run data and optional services. • CLDB on two to four nodes. • JobTracker or HBase Master on two or three nodes. • ZooKkeeper on three or five nodes (run ZooKeeper on an odd number of nodes). Reduce the number of task slots on servers running both data and management services to ensure that the processor and memory resources are available to management services. Web server and NFS server can also be run on nodes running management services. Spread management services across racks and across nodes to avoid running an instance of each management service on a single node. For faster failover recovery times, avoid running ZooKeeper and CLDB services on the same servers. IBM System x reference architecture for Hadoop: MapR 19 > 100 7 or more Dedicate nodes to management services. Do not run data or optional services on the same nodes running management services. • CLDB on three or more nodes • JobTracker or HBase Master on two or three nodes. • ZooKeeper on five nodes (run ZooKeeper on an odd number of nodes). On very large clusters, dedicate nodes to running only CLDB. On very large clusters, dedicate nodes to running only ZooKeeper. Spread management services across racks and across nodes to avoid running an instance of each management service on a single node. Table 4: Number of nodes running MapR management services In clusters up to 100 nodes, management services typically reside on nodes that also provide data services. For a very small cluster that does not require failover or high availability, all the management services and data services can run on one node. However, HA is recommended and requires at least three nodes running management services. Even if high availability is not required, MapR M7 provides snapshots, mirrors, and multiple NFS servers. Also, the HA features of M7 provide a mechanism for administrators to perform rolling upgrades of management processes without any downtime on the cluster. A small cluster with less than 40 nodes should be set up to run the management services on three to five nodes for high availability. A medium-sized cluster should be set up to run the management services on at least five nodes with ZooKeeper, CLDB, and JobTracker nodes distributed across racks. This environment provides failover and high availability for all critical services. For clusters over 100 nodes, some nodes can be dedicated to management services. A large cluster can be set up to run the management services on a minimum of seven nodes with these nodes distributed across racks. In a large cluster, isolate the CLDB service from other services by placing them on dedicated nodes. In addition, in large clusters, ZooKeeper services should be isolated from other services on dedicated nodes. To reduce recovery time upon node failure, avoid running CLDB and ZooKeeper on the same node. Deployment diagram The reference architecture for the MapR solution requires two separate networks; one for hardware management and one for data. Each network requires a rack switch. An additional data network switch can be configured for high availability and increased throughput. One rack switch occupies 1U of space in the rack. Figure 7 shows an overview of the architecture in three different one-rack sized clusters without network redundancy: a starter rack, a half rack and a full rack. Figure 8 shows a multirack-sized cluster without network redundancy. The intent of the four predefined configurations is to ease initial sizings for customers and to show example starting points for different sized workloads. The reference architecture is not limited to these four sized clusters. The starter rack configuration consists of three nodes and a pair of rack switches. The half rack configuration consists of 10 nodes and a pair of rack switches. The full rack IBM System x reference architecture for Hadoop: MapR 20 configuration (a rack fully populated) consists of 20 nodes and a pair of rack switches. The multirack contains a total of 60 nodes; 20 nodes and a pair of switches in each rack. A MapR implementation can be deployed as a multirack solution to support larger workload and deployment requirements. In the “Networking” section of this document, you can see the networking configuration across multiple nodes and multiple racks. Figure 7: Starter rack, half rack, and full rack MapR predefined configurations IBM System x reference architecture for Hadoop: MapR 21 Figure 8: Multirack MapR predefined configuration Deployment considerations The predefined node configuration can be tailored to match customer requirements. Table 5 shows the common ways to adjust the predefined configuration. The Value options are available for customers who require a cost-effective solution. Performance options are for customers who want top performance. The Enterprise option offers redundancy. A combination of options, such as Value and Enterprise or Performance and Enterprise, can be considered as well. The Intel Xeon processor E5-2650 v2 is recommended to provide sufficient performance, however smaller or larger processors may be used. A minimum of 64 GB of memory is recommended for most MapReduce workloads with 96 GB or more recommended for HBase and memory-intensive MapReduce workloads. A choice of 3 TB and 4 TB drives is suggested depending on the amount of data that needs to be stored. IBM also offers 2 TB drives. This size may meet the storage density requirements of some big data analytics workloads. SAS drives may be substituted in place of Serial Advanced Technology Attachment (SATA) drives but IBM has found that SATA drives offer the same performance for less cost. IBM System x reference architecture for Hadoop: MapR 22 Description Value options Enterprise options Performance options Node 2 x Intel Xeon processor E5-2630 v2 2.6 GHz 6core 1600MHz RDIMM 2 x Intel Xeon processor E5-2650 v2 2.6 GHz 8core 1866MHz RDIMM 2 x Intel Xeon processor E5-2650 v2 2.6 GHz 8-core 1866MHz RDIMM or 2 x E5-2680 2.8 GHz 10core 1866MHz RDIMM Memory – base 64 GB (8 x 8 GB) 64 GB (8 x 8 GB) 64 GB (8 x 8 GB ) or 128 GB (16 x 8GB) Disk (OS) 1 x 3 TB 3.5 inch 2 x 3 TB 3.5 inch (mirrored) 1 x 3 TB 3.5 inch Disk (data) 36 TB (12 x 3 TB NL SATA 3.5 inch) or 48 TB (12 x 4 TB NL SATA 3.5 inch) 36 TB (12 x 3TB NL SATA 3.5 inch) or 48 TB (12 x 4TB NL SATA 3.5 inch) 36TB (12 x 3 TB NL SATA 3.5 inch) or 48TB (12 x 4 TB NL SATA 3.5 inch) HDD controller 6Gb JBOD controller 6Gb JBOD controller 6Gb JBOD controller Available data space*∗ ∗∗ ∗ (per node) 9-22 TB with 3 TB drives or 12-30 TB with 4 TB drives 9-22 TB with 3 TB drives or 12-30 TB with 4 TB drives 9-22 TB with 3 TB drives or 12-30 TB with 4 TB drives Data network switch 1GbE switch with 4 x 10GbE uplinks (IBM G8052) Redundant switches 10GbE switch with 4 x 40GbE uplinks (IBM G8264) Hardware management network switch 1GbE switch with 4 x 10GbE uplinks (IBM G8052) 1GbE switch with 4 x 10GbE uplinks (IBM G8052) 1GbE switch with 4 x 10GbE uplinks (IBM G8052) Table 5: Additional hardware options Systems management The mechanism for systems management within the MapR solution is different from other Apache Hadoop distributions. The standard Hadoop distribution places the management services on separate servers than the data servers. In contrast, MapR management services are distributed across the same System x nodes that are used for data services. Storage considerations Each server node in the reference architecture has an internal directly attached storage. External storage is not used in this reference architecture. In situations where higher storage capacity is required, the main design approach is to increase the amount of data disk space per node. Using 4 TB drives instead of 3 TB drives increases the total per node data disk capacity from 36 TB to 48 TB, a 33% increase. Consider using the same size drives for the OS to simplify maintenance to one type of disk drive. When increasing data disk capacity, you must be ∗ Available data space assumes the use of Hadoop replication with three copies of the data, and 25% capacity reserved for efficient file system operation and to allow time to increase capacity if needed. Available data space may increase significantly with MapR automatic compression. Because compression can vary widely by file contents, this estimate provides a range from no compression up to 2.5 times compression. Some data may have even greater compression. IBM System x reference architecture for Hadoop: MapR 23 cognizant of the balance between performance and throughput. For some workloads, increasing the amount of user data stored per node can decrease disk parallelism and negatively impact performance. In this case, and when 4 TB drives provide insufficient capacity, higher capacity can be achieved by increasing the number of nodes in the cluster. Performance considerations There are a couple of approaches to increasing cluster performance: increasing node memory and using a high-performance job scheduler and MapReduce framework. Often, improving performance comes at increased cost and you need to consider the cost/benefit trade-offs of designing for higher performance. In the MapR predefined configuration, node memory can be increased to 96 GB by using twelve 8 GB RDIMMs. Even larger memory configuration might provide greater performance depending on the workload. Architecting for lower cost There are two key modifications that can be made to lower the cost of a MapR reference architecture solution. When considering lower-cost options, it is important to ensure that customers understand the potential lower performance implications of a lower cost design. A lower cost version of the MapR reference architecture can be achieved by using lower cost node processors and lower cost cluster data network infrastructure. The node processors can be substituted with the Intel Xeon E5-2630 v2 2.6 GHz 6-core processor. This processor requires 1600 MHz RDIMMs, which may also lower the per-node cost of the solution. Using a lower cost network infrastructure can significantly lower the cost of the solution, but can also have a substantial negative impact on intra-cluster data throughput and cluster ingest rates. To use a lower cost network infrastructure, use the following substitutions to the predefined configuration: • Within each node, substitute the Mellenox 10GbE dual SFP+ network adapter with the additional ports on the integrated 1GBaseT adapters within the System x3630 M4 server. • Within each rack, substitute the IBM RackSwitch G8264 top of rack switch with the IBM RackSwitch G8052. • Within each cluster, substitute the IBM RackSwitch G8316 core switch with the IBM RackSwitch G8264. • Though the network wiring schema is the same as that described in the networking section, the media types and link speeds within the data network have changed. The data network within a rack that connects the cluster nodes to the lower cost option, G8052 top of rack switch, is now based on two aggregated 1GBaseT links per node. The physical interconnect between the admin/management networks and the data networks within each rack is now based on two aggregated 1GBaseT links between the admin/management network G8052 switch and the lower cost data network G8052 switch. Within a cluster, the racks are interconnected through two aggregated 10GbE links between the substitute G8052 data network switch in each rack and a lower cost G8264 core switch. IBM System x reference architecture for Hadoop: MapR 24 Architecting for high ingest rates Architecting for high ingest rates is not a trivial matter. It is important to have a full characterization of the ingest patterns and volumes. The following questions provide guidance to key factors that affect the rates: • On what days and at what times are the source systems available or not available for ingest? • When a source system is available for ingest, what is the duration for which the system remains available? • Do other factors impact the day, time, and duration ingest constraints? • When ingests occur, what is the average and maximum size of ingest that must be completed? • What factors impact ingest size? • What is the format of the source data (structured, semi-structured, unstructured)? • Are there any data transformation or cleansing requirements that must be achieved during ingest? Scaling considerations The Hadoop architecture is designed to be linearly scalable. When the capacity of the existing infrastructure is reached, the cluster can be scaled out by simply adding additional nodes. Typically, identically configured nodes are best to maintain the same ratio of storage and compute capabilities. A MapR cluster is scalable by simply adding additional System x3650 M4 BD nodes and network switches and optionally adding management services and optional services on those nodes. MapR No NameNode architecture allows linear scalability to trillions of files and thousands of petabytes. As the capacity of existing racks is reached, new racks can be added to the cluster. It is important to note that some workloads may not scale completely linear. When designing a new MapR reference architecture implementation, future scale out should be a key consideration in the initial design. There are two key aspects to consider: networking and management. Both of these aspects are critical to cluster operation and become more complex as the cluster infrastructure grows. The cross rack networking configuration described in Figure 8 is designed to provide robust network interconnection of racks within the cluster. As additional racks are added, the predefined networking topology remains balanced and symmetrical. If there are plans to scale the cluster beyond one rack, a best practice is to initially design the cluster with multiple racks – even if the initial number of nodes would fit within one rack. Starting with multiple racks can enforce proper network topology and prevent future reconfiguration and hardware changes. As racks are added over time, multiple G8316 switches may be required for greater scalability and balanced performance. Also, as the number of nodes within the cluster increases, so do many of the tasks of managing the cluster, such as updating node firmware or operating systems. Building a cluster management framework as part of the initial design and proactively considering the challenges of managing a large cluster will pay off significantly in the long run. xCAT, an open source project that IBM supports, is a scalable distributed computing management and provisioning tool that provides a unified interface for hardware control, IBM System x reference architecture for Hadoop: MapR 25 discovery, and operating system deployment. Within the MapR reference architecture, the System x server integrated management modules (IMM2) and the cluster management network provides an out-of-band management framework that management tools, such as xCAT, can use to facilitate or automate the management of cluster nodes. Training is required to fully use the capabilities in xCAT. See the “Resources” section for more information about xCAT. Proactive planning for future scale out and the development of cluster management framework as a part of initial cluster design provides a foundation for future growth that can minimize hardware reconfigurations and cluster management issues as the cluster grows. High availability considerations When implementing a MapR cluster on System x, consider availability requirements as part of the final hardware and software configuration. Typically, Hadoop is considered a highly reliable solution, but MapR enhancements make it highly available. Hadoop and MapR best practices provide significant protection against data loss. MapR ensures that failures are managed without causing an outage. There is redundancy that can be added to make a cluster even more reliable. Some consideration should be given to both hardware and software redundancy. Networking considerations If network redundancy is a requirement, use an additional switch in the data networks. Optionally, a second redundant switch can be added to ensure high availability of the hardware management network. The hardware management network will not affect the availability of the MapR-FS or Hadoop functionality, but may impact the management of the cluster, and so, availability requirements must be considered. MapR provides application-level Network Interface Card (NIC) bonding for higher throughput and high availability. Customers can either choose MapR application-level bonding or OS-level bonding and switchbased aggregation of some form matching the OS bonding configuration when using multiple NICs. Virtual Link Aggregation Groups (vLAG) can be used between redundant switches. vLAG is an IBM BNT switch feature. If 1Gbps data network links are used, it is recommended that more than one is used per node to increase throughput. Hardware availability considerations With no single point of failure, redundancy in server hardware components is not required for MapR. MapR automatically and transparently handles hardware failure resulting in the loss of any node in the cluster running any data or management service. MapR’s default three-way replication of data ensures that no data is lost because two additional replicas of data are maintained on other nodes in the cluster. MapReduce tasks or HBase region servers from failed nodes are automatically started on other nodes in the cluster. Failure of a node running any management service is automatically and transparently recovered as described in the following services. • All ZooKeeper services are available for read operations, with one acting as the leader for all writes. If the node running the leader fails, the remaining nodes will elect a new leader. Most commonly, three ZooKeeper instances are used to allow HA operations. In some large clusters, five ZooKeeper instances are used to allow fully HA operations even during maintenance windows that affect ZooKeeper instances. The number of instances of IBM System x reference architecture for Hadoop: MapR 26 ZooKeeper services that must be run in a cluster depends on the cluster’s high availability requirement, but it should always be an odd number. ZooKeeper requires a quorum of (N/2)+1 to elect a leader where N is the total number of ZooKeeper nodes. Running more than five ZooKeeper instances is not necessary. • All CLDB services are available for read operations, with one acting as the write master. If the node running the master CLDB service goes down, another running CLDB will automatically become the master. A minimum of two instances is needed for high availability. • One JobTracker service is active. Other JobTracker instances are configured but not running. If the active JobTracker goes down, one of the configured instances automatically takes over without requiring any job to restart. A minimum of two instances is needed for high availability. • If running HBase on the cluster, one HBase Master service is active. Other HBase Master instances are configured, but not running. If the active HBase Master goes down, one of the configured instances automatically takes over HBase management. A minimum of two instances is needed for high availability. (MapR also offers M7 that provides a native NoSQL database through HBase APIs without requiring any HBase Master or RegionServer processes.) • All NFS servers are active simultaneously and can present an HA NFS server to nodes external to the cluster. To do this, specify the virtual IP addresses for two or more NFS servers for NFS high availability. Additionally, use round-robin Domain Name System (DNS) across multiple virtual IP addresses for load balancing in addition to high availability. For NFS access from within the cluster, NFS servers should be run on all nodes in the cluster and each node should mount its local NFS server. • MapR WebServer can run on any node in the cluster to run the MapR Control System. The web server also provides a REST interface to all MapR management and monitoring functions. For HA, multiple active web servers can be run with users connecting to any web server for cluster management and monitoring. Note that even with no web server running, all monitoring and management capabilities are available using the MapR command line interface. • Within racks, switches and nodes have redundant power feeds with each power feed connected from a separate PDU. IBM System x reference architecture for Hadoop: MapR 27 Storage availability • RAID disk configuration is not necessary and should be avoided in MapR clusters. The use of RAID causes a negative impact on performance. MapR provides automated setup and management of storage pools. The three-way replication provided by MapR-FS provides higher durability than RAID configurations because multiple node failures might not compromise data integrity. • If the default 3x replication is not sufficient for availability requirements, the replication factor can be increased on a file, volume, or cluster basis. Replication levels higher than 5 are not normally used. Mirroring of MapR volumes within a single cluster can be used to achieve very high replication levels for higher durability or for higher read bandwidth. Mirrors can be used between clusters as well. MapR efficiently mirrors by only copying changes to the mirror. Mirrors are useful for load balancing or disaster recovery. • MapR also provides manual or scheduled snapshots of volumes to protect against human error and programming defects. Snapshots are useful for rollback to a known data set. Software availability considerations • Operating system availability One of the hard disk drives can be used on each node to mirror the operating system. RAID1 should be used to mirror the two drives. If OS mirroring is not used, the disk that would have been used for OS mirroring is available for MapR data storage. Using the same disk for OS and data is not recommended because it can compromise performance. • NameNode availability MapR Hadoop distribution is unique because it was designed with a No NameNode architecture for high availability. MapR is the only Hadoop distribution designed with no single point of failure. Other Hadoop distributions have a single primary NameNode and when the NameNode goes down, the entire cluster becomes unavailable until the NameNode is restarted. In those cases, where other distributions are configured with multiple NameNodes, the entire cluster becomes unavailable during the failover to a secondary NameNode. With MapR, the file metadata is replicated, distributed, and persistent, so that there is no data loss or downtime even in the face of multiple disk or node failures. • JobTracker availability The MapR JobTracker HA improves recovery time objectives and provides for a selfhealing cluster. Upon failure, the MapR JobTracker automatically restarts on another node in the cluster. TaskTrackers can automatically pause and then reconnect to the new JobTracker. Any currently running jobs or tasks continue without losing any progress or failing. • NFS availability IBM System x reference architecture for Hadoop: MapR 28 You can easily set up a pool of NFS nodes with HA and failover using virtual IP addresses. If one node fails, the virtual IP addresses will be automatically reassigned to the next NFS node in the pool. It is also common to place an NFS server on every node where NFS access to the cluster is needed. Migration considerations If migration of data or applications to MapR is required, consideration needs to be given to the type and amount of data to be migrated and the source of the data being migrated. It is possible to migrate most data types into MapR-FS, but it is necessary to understand the migration requirements to verify viability. Standard Hadoop tools such as distcp can be used to migrate data from other Hadoop distributions. For data in a POSIX file system, you need to NFS mount the MapR cluster and use standard Linux commands to copy the files into the MapR Hadoop cluster. Either Sqoop or database import/export tools with MapR NFS can be used to move data between databases and MapR Hadoop. You also need to consider whether applications need to be modified to take advantage of the Hadoop functionality. With MapR read/write file system that can be mounted by a standard NFS client, significant effort required to migrate applications to other Hadoop distributions can often be avoided. IBM System x reference architecture for Hadoop: MapR 29 Appendix 1: Bill of material This section contains a bill of materials list for each of the core components of the reference architecture. The bill of materials includes the part numbers, component descriptions, and quantities. Table 6 shows how many core components are required for each of the predefined configuration sizes. Size Component Quantity Small Node Administration/Management network switch Data network switch Rack Cables 3 1 1 1 3 Medium Node Administration/Management network switch Data network switch Rack Cables 10 1 1 1 10 Large Node Administration/Management network switch Data network switch Rack Cables 20 1 1 1 20 Very large Node Administration/Management network switch Data network switch Rack Cables 100 5 5 5 100 Table 6: Mapping between predefined configuration sizes and the bill of materials. Node Part number Description Quantity 5466AC1 -SB- IBM System x3650 M4 BD 1 A4T6 1U Riser Card, Two PCIe x8 Slots 1 A4T7 1U Riser Card, One PCIe x8 slot (For Slotless RAID only) 1 A47H Solarflare SFN5162F 2x10GbE SFP+ Performant Adapter for IBM System x 1 5977 Select Storage devices – no IBM-configured RAID required 1 A3W9F IBM 4TB 7.2K 6Gbps NL SATA 3.5 inch G2HS HDD 14 A4RW -SB- IBM System x 900W High Efficiency Platinum AC Power Supply 1 A4WC -SB- System Documentation and Software-US English 1 IBM System x reference architecture for Hadoop: MapR 30 A4S4 Intel Xeon Processor E5-2650 v2 8C 2.6GHz 20MB Cache 1866MHz 95W 1 A4SA Addl Intel Xeon Processor E5-2650 v2 8C 2.6GHz 20MB Cache 1866MHz 95W 1 A3QG 8GB (1x8GB, 1Rx4, 1.5V) PC3-14900 CL13 ECC DDR3 1866MHz LP RDIMM 8 A3MW N2115 SAS/SATA HBA for IBM System x 1 A4RQ -SB- x3650 M4 Planar 1 A4RG X3650 M4 BD Chassis ASM w/o Planar 1 6311 2.8m, 10A/100-250V, C13 to IEC 320-C14 Rack Power Cable 1 A4RR -SB-3.5" Hot Swap BP Bracket Assembly , 12x 3.5" 1 A4RS 3.5" Hot Swap Cage Assembly, Rear, 2 x 3.5"(Cage Only) 1 2306 Rack Installation >1U Component 1 A4RH -SB- BIOS GBM 1 A4RJ -SB- 1U RIASER CAGE - SLOT 2 1 A4RK -SB- 1U BUTTERFLY RIASER CAGE - SLOT 1 1 A4RT -SB- Rear Backplane, 2x3.5" HDD Extension To Front Array (14HDD Array) 1 A4RP -SB- Label GBM 1 A50F -SB- 2x2 HDD BRACKET 1 A48R 2U Bracket for Solarflare Dual Port 10GbE SFP+ Adapter 1 A4TJ -SB- Cable, IPASS, 760mm for 6Gb cards 1 A207 Rail Kit for x3650 M4 BD, x3630 M4 and x3530 M4 1 A20P Package for x3650 M4 BD and x3630 M4 1 A2M3 Shipping Bracket for x3650 M4 BD and x3630 M4 1 Table 7: Sample node bill of materials. Administration / Management network switch Part number Description Quantity 7309HC1 IBM System Networking RackSwitch G8052 (Rear to Front) 1 6311 2.8m, 10A/100-250V, C13 to IEC 320-C14 Rack Power Cable 2 A3KP IBM System Networking Adjustable 19" 4 Post Rail Kit 1 2305 Rack Installation of 1U Component 1 Table 8: Sample administration / management network switch bill of materials. IBM System x reference architecture for Hadoop: MapR 31 Data network switch Part number Description Quantity 7309HC3 IBM System Networking RackSwitch G8264 (Rear to Front) 1 6311 2.8m, 10A/100-250V, C13 to IEC 320-C14 Rack Power Cable 2 A3KP IBM System Networking Adjustable 19" 4 Post Rail Kit 1 2305 Rack Installation of 1U Component 1 Table 9: Sample data network switch bill of materials Rack Part number Description Quantity 1410RC4 E1350 42U rack cabinet 1 6033 DPI Single-phase 60A/208V C13 Enterprise PDU (US) 4 2202 Cluster 1350 Ship Group 1 2304 Rack Assembly - 42U Rack 1 2310 Cluster Hardware and Fabric Verification - 1st Rack 1 Table 10: Sample rack bill of materials Cables Part Number Description Quantity A1PJ 3m IBM Passive DAC SFP+ Cable 2 2323 IntraRack CAT5E Cable Service 2 Table 11: Sample cables bill of materials This bill of materials information is for the United States; part numbers and descriptions may vary in other countries. Other sample configurations are available from your IBM sales team. Components are subject to change without notice. IBM System x reference architecture for Hadoop: MapR 32 Resources • IBM System x3650 M4 BD (MapR Node) - On ibm.com: ibm.com/systems/x/hardware/rack/x3650m4bd/ - On IBM Redbooks: ibm.com/redbooks/abstracts/tips0889.html • IBM RackSwitch G8052 (1GbE Switch) - On ibm.com: ibm.com/systems/networking/switches/rack/g8052 - On IBM Redbooks: ibm.com/redbooks/abstracts/tips0813.html • IBM RackSwitch G8264 (10GbE Switch) - On ibm.com: ibm.com/systems/networking/switches/rack/g8264 - On IBM Redbooks: ibm.com/redbooks/abstracts/tips0815.html • MapR: - MapR main website: http://www.mapr.com - MapR products: http://www.mapr.com/products - MapR M7 overview: http://www.mapr.com/products/m7 - MapR No NameNode Architecture: http://www.mapr.com/products/only-withmapr/namenode-ha - MapR Resources: http://www.mapr.com/resources - MapR products and differentiation: http://www.mapr.com/products/only-with-mapr - MapR starting point: http://www.mapr.com/doc/display/MapR/Start+Here - Planning your MapR Deployment: http://www.mapr.com/doc/display/MapR/Planning+the+Deployment - MapR CLDB: http://mapr.com/doc/display/MapR/Isolating+CLDB+Nodes - MapR ZooKeeper: http://mapr.com/doc/display/MapR/Isolating+ZooKeeper+Nodes • Open source software: - Hadoop: http://hadoop.apache.org - Flume: http://incubator.apache.org/flume - HBase: http://hbase.apache.org - Hive: http://hive.apache.org - Oozie: http://incubator.apache.org/oozie - Pig: http://pig.apache.org - ZooKeeper: http://zookeeper.apache.org • Other resources - xCat: http://xcat.sourceforge.net/ IBM System x reference architecture for Hadoop: MapR 33 Trademarks and special notices © Copyright IBM Corporation 2014. References in this document to IBM products or services do not imply that IBM intends to make them available in every country. IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Intel, Intel Inside (logos), MMX, and Pentium are trademarks of Intel Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. Information is provided "AS IS" without warranty of any kind. All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer. Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products. All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Contact your local IBM office or IBM authorized reseller for the full text of the specific Statement of Direction. Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning. IBM System x reference architecture for Hadoop: MapR 34 Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here. Photographs shown are of engineering prototypes. Changes may be incorporated in production models. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk.