1
A Review on Security and Privacy Challenges of Big Data
Manbir Singh, Mariia Talalaeva, Ali Syed, Azeem Mohammad, Gullu Ekici, Malka N. Halgamuge School of Computing and Mathematics, Charles Sturt University, Melbourne, Victoria 3000, Australia [email protected], [email protected], [email protected], [email protected]
Abstract - Big data has a growing number of confidentiality and security issues. New technology doubtlessly brings people a lot of benefits, privileges, convenience and efficiencies. At the same time, technological advances are accompanied with threats that can pose dangerous risks. Privacy of data is a source of much concern to researchers throughout the globe. A question that remains to be answered is what exactly can be done to resolve confidentially and privacy issues of big data? To answer some questions content for this chapter has been collected to analyze from 57 published articles from 2007 to 2016 in order to find resolving answers for Big Data confidentiality issues. The documents range from different industries that include healthcare, finance, robotics, web applications, social media, and mobile communication. The selected journal articles were used to make comparative analysis of security issues in different areas and casted solutions for them. This chapter consists of four main parts: introduction, materials and method, results, discussion and conclusion. This inquiry aims to find different security issues of big data in various areas and give solutions by analyzing results. The results of the content analysis suggest that the internet applications and financial institutions are dealing with specific security problems, whereas social media and other industries are dealing with confidentiality issues of sensitive information and have heightened privacy concerns. Both these issues are addressed in this study, as retrieved results from the data, highlighted gaps that can be further researched for development. The method used to gather data for this chapter is through the analysis of studies that deal with a particular confidentiality issue and each problem. After the analysis and evaluation, suggestion that can confront confidentiality issues are displayed by using a different algorithm and method. This research has addressed gaps in the literature by highlighting security and privacy issues that big companies face with recent technological advancements in corporate societies. By doing this, the research could shed revolutionary light on issues of big data and provide future research directions to solve them.
Index Terms— Big Data, Security, Privacy.
I INTRODUCTION
A significant portion of Information technology research efforts goes into analyzing and monitoring data regarding events on the servers, networks and other connected devices. Big data is a fairly a new concept in modern technological world. There has been an increasing usage of big data, as the problem of security has become very important (Kim, Kim & Chung, 2013). This chapter covers different aspects of big data security, in particular, challenges related to big data variety, velocity and volume. The amount of sensitive information that needs to be protected is constantly increasing (Islam, 2014). However, in the era, we live in, information is required to be protected from and this is in any profession. Insufficient protection can bring various security challenges (Faulkner & Kritzstein & Zimmerman, 2011; Zhang & Dou & Pei & Nepal & Yang & Liu & Chen 2015; Abou-Tair &
2
Berlik; & Kelter 2007). The notion of big data came to use not long time ago, as it relates to big amounts of information that companies produce and need to store. This information is then used to analyzed to increase future sales by analyzing trend however this growing number of informative data is subjected to storage issues as there is not yet been a safe storage system designed. In fact, any given amount of information that is generated is an issue of security and privacy. Confidentiality of sensitive information becomes a perplexing issue for companies if they do not take considerable amount of time, effort, and resources to deal with this issue. The Big Data is an area which is used to manage a great number of datasets, as all the vast amount of data is often not structured and have been stored from different sources (Ferretti & Pierazzi & Colajanni 2014; Chang & Tsai & Lin 2009). Traditional access control mechanisms to ensure privacies that are insufficient in recent times with the growth of demands bring the need of a fine granular access control mechanism to make sure that every aspect of privacy is reflected. This framework is called an ontology-driven XACML context (Abou-Tair & Kelter 2007). On the other hand, providing privacy in cloud is much more complex (Huang & Du, 2014). The majority of data preserving techniques are targeted at small levels as they often fall short; an algorithm is designed with MapReduce to gain high scalability by performing computing in parallel. This method is called local record anonymization (Huang & Du 2014). In this case, the hybrid cloud is a very different approach and difficult to implement. The idea is to separate the Sensitive data from non-sensitive data and store them in different trusted clouds. This method of isolation is best suited for processing the image files in an entirely different approach, as it is used to deal with data at rest (Zhang & Dou & Pei & Yang & Chen, 2015). A multilevel identity encryption method is used both at the file level and at block level to satisfy data protection. This process helps to leverage cloud provider for having transparency (Raghuwanshi & Rajagopalan 2014).
Preserving privacy of information is one big issue; nonetheless providing security to IT is another huge matter huge issue. There are a lots of security risks in big data, as the main one is: Privacy leakage is one of the most dangerous and already caused many problems (Abou-Tair & Berlik & Kelter 2007). Therefore, a whole range of data needs
different computational techniques to make it secure and safe. The first step in cloud security is to ensure the entry
points. This helps to detect possible attacks (Pham & Syed & Mohammad & Halgamuge 2010), and alert users to use the instruction detection system (Tan & Nagar & He & Nanda (2014). Encryption is another way of making data safe in the local area network, VPN encryption as this is used to safeguard data (Mengke & Xiaoguang & Jianqiu & Jianjian 2016). On the other hand, in the case of web application, randomization based Random4 encryption algorithm is also used (Tan & Nagar & He & Nanda, 2014). This prompts Big cooperates to use MuteDB
architecture, where they are incorporating data encryption into, key management, authorization, and authentication
from a new MuteDB architecture. This architecture assures scalable solution to guarantee the confidentiality of information in the database (Ferretti & Pierazzi & Colajanni & Marchetti (2014).
3
Security of medical data also falls into the same category, as both cloud-based technologies and attribute-based encryptions are used for storage and retrieval (Syed & Teja 2014). Another way is to use a pocket-sized computer, which is called Raspberry Pi. This computer makes sure that regional data is collected and kept isolated (Feng & Onafeso & Liu 2016). Below is a review on the Data Mining process:
II DATA PROCESSING METHOD
The Knowledge Discovery from Data (KDD), which is often treated, as a synonym for ‘Data mining’ as this is a
method used to discover information from data to avoid leakage. Every day millions of bytes of data that has been
generated throughout the world, and by analyzing trends in each data, researchers understand the needs of organizing
data. This process in return allows companies to grow and remain in the competitive market by analyzing seasonal
trend and launching product in peak seasons. Therefore, the research in the area is significant and requires
development. There are usually three steps involved in the method of KDD, which is performed in an iterative way.
They are discussed as below.
Step 1. Data processing: Data processing method is a step that selects inconsistencies of missing data fields
and removes, and reintegrates them into the data pool. It is presented in a form so that it can be read quickly
generated to reach potential results. Step 2. Data Transformation: This step is to transfer data into forms appropriate for mining. The Data is not
presented in its proper form, and therefore, it must be sorted out to represent a type that can generate some useful
information. Step 3. Data Mining: In data mining, various methods are employed to extract the information from the data, as
algorithms are used to extract information from the data pool.
Step 4. Pattern Evaluation: After the data is extracted, patterns are then evaluated to obtain knowledge on trends.
PRIVACY
4
Every day, large amounts of data is generated and processed in an array of industries. Thus, the privacy of data is a
big concern. In fact, there are four different types of steps involved: (i) Data provider, (ii) Data Collector, (iii) Data
Miner, and finally (iv) Decision makers, as these are people who are involved in the processing of which data is
collected and derived knowledge from each data group. Each one has different challenges to privacy protection, as
these are discussed below:
Approaches to privacy protection by:
A. Data Provider: Data providers can provide data voluntarily according to the demands of the Data Collector. The data collectors can
retrieve data from the providers of customers’ daily activities. However, there are many ways to limit the data
collectors’ access to this data and this could be done in several ways. Internet companies now have a strong
motivation to track user’s movement over the Internet to ensure that the valuable information can be extracted from
the data produced by the user’s online activities. These can block the advertisements on the sites, and also kill the
script, For example, AdBlock. Encryption tools are used to encrypt data and transfer them into Cypher-text which is
not in a readable form and thus it can be transferred in a safe way.
B. Data collector: The original data retrieved from data provider generally obtains sensitive information, and if experts do not take
specific precautions before passing on the data into data miner, the sensitive information can be disclosed to the
public, and confidentiality have be a troubling issue for companies. However, this can be solved by replacing some
value with a parent value, this method is a good way to hide sensitive information. Permutation de-associates the
relationship between quasi-identifier and numerical attributes by dividing the data sets into groups and shuffling the
information among groups. Perturbation can also operate data with some false value to hide it from collectors. This
includes adding noise, swapping data, and generating synthetic data.
5
C. Data Miner: The data miner uses an algorithm to obtain data from the data collector. However, there are two types of privacy
issues that can risk confidentiality in this process. Firstly, when data is directly observed the information could be
leaked. At times, even the data mining results may also leak private information. Some approaches are helpful, such
as, the Heuristic distortion approach that helps to resolve how to select the appropriate data sets for data
modification. This method works by replacing certain attributes of data items with a particular symbol. Probabilistic
distortion approach distorts data through random numbers generated from a predefined probability distribution
function. The reconstruction based approach generates a database from scratch that is compatible with a set of non
sensitive rules.
D. Decision Maker: The ultimate goal of data mining is to provide information to the decision maker, however, to achieve its objectives it
is compulsory to meet the confidentiality rules and regulation to protect people. At first glance, it seems that the
decision maker has no responsibility, but in actuality they must have a duty of care. If the results are disclosed to
competitors, the policy makers will suffer the loss, as the openness, freedom and anonymity of the Internet, as Data
Provenance poses great challenge for seeking the provenance of information. Also, the decision maker must look at
five aspects of information including authority, accuracy, objectivity, currency and coverage.
SECURITY
Security has always been a an issue however in recent times with the growth of Conventional security mechanisms
that are used to secure small scale or static data are no longer adequate, as far as big data is concerned. There are
often loopholes in the system that allows intruders to exploit services. There are some security challenges, which is
prevalent in the vicinity. The majority of organizations that are dealing with sensitive information are always at threat
of data theft. Below table 1 displays types of security challenges and treats that need attention.
6
III MATERIAL AND METHOD
This chapter has categorized different security and privacy issues in big data according to types of issues and some
parameters. The data that has been collected through a content analyses, which is by retrieving content through 24
publications from 2007 to 2016 to find answers and also to analyzed from different industries including healthcare,
finance, robotics, web applications, social media, and mobile communications. An agile approach is used for the
project because data that depends on IT and has massive computation (“The 5 Methodology Milestones for Big
Data”, 2015). Data about security issues and solutions was collected from different journal articles, and data was
surmised through thorough analysis data was collected, then it was structured, and analyzed. The data was displayed
in a table for better comprehension that shows the area of security issue, solution, algorithm which is used for
solution and also general and technical remarks accompanying. After analyzes the data, results and conclusion were
drawn objectively.
A. Data Collection Method: -
Data is collected from twenty-four different studies, which were published, in between the years 2007-2016. All
these studies are available from the IEEE databases. B. Data Analyses Method: -
Data is analyzed by categorizing the collected data in a table. These data sets include attributes such as volume of
data, application area, and issues that are prevailing. The methodology adopted allows solving these issues, and
the algorithm selected is merely for this purpose.
Table 1: - Different Security Challenges and Solutions. Security Challenges Solution for Data Security 1. Real-Time Monitoring: Real-time monitoring has always been a big issue on account of the number and frequency of security alerts that are generated. It is now easy to rectify any loopholes or dangers, but it takes lots of effort to find such threats (Kizza2015). Layered Protection: In computer hardware, the security of information system is formed through expansion of layers. The securities of outer layers rely on the security of inner layers. The more the layers the better the security is (Kuhn & Walsh& Fries2005). 2. Granular Audits: In case the real time monitoring system, it does not capture the attacks, therefore, audits are needed (Lee et al.) Protection of Different Domains: DNS can be divided into local region, network perimeter, network transmission, and infrastructure. Therefore, different technologies are used in
7
various procedures to secure areas in order to establish distributed security system Stouffer Falco& Scarfone2011).
3. Secure Computations in Distributed Systems: In this system, parallel computation and storages are used to process massive amounts of data. Securing the mapper and data in the presence is not a trusted mapper and becomes the primary concern (Montlick, 1996).
Hierarchical Protection: Since the importance of the same information is different in different institutes. Thus hierarchical protection is needed, and in this case, different access control measures are used so that a particular user covers the only specific parameter (Zissis,& Lekkas, 2012).
4. Secure Data Storage and Transactions Logs: Data and transactions are stored in multi layers. Moving data manually among levels is not an issue, nonetheless given the amount of data that is generated, and auto tiring for management is needed. Nonetheless it does not keep track where the data is stored. Thus maintaining 24/7 availability is a big issue (Montlick, 1996).
Time-Sharing Protection: Information security in big data is a dynamic process. Taking time into consideration of securing of BigData can be incredibly enhanced Vashist, 2015).
5. Endpoint Evaluation: Many large organizations require data collection from various sources. A key challenge here is to validate the input, this indeed is the validation and filtering of data as this can be daunting as the challenges are posed by untrusted data sources (Goel & Hong2015).
3KDEC Algorithm: A symmetric key block encipherment algorithm is used to present the practical solution to the problem where numeric data is converted to alphanumeric type and thus encrypted data is not stored in existing numeric fields (Kaur &Dhindsa & Singh,2009).
Table 2: - Description of Algorithms and other Key Terms Used In This Chapter Algorithm/ Key Terms Description
MapReduce (Zhang & Dou & Pei &Nepal & Yang & Liu & Chen 2015).
MapReduce Programming model is for generating and processing data sets with distributed and parallel algorithm on clusters.
AES (Huang & Du 2014)
AES (Advanced Encryption Standard) algorithm is a symmetric block cipher used to protect the sensitive and confidential information by encrypting it to an unreadable form.
Multilevel identity encryption (Raghuwanshi & Rajagopalan 2014)
This is an extension of normal encryption. Here the identity of the data on Cloud is encrypted and protected by multiple layers.
ORAM (Li & Guo, 2014, April).
Oblivious RAM (Random Access Memory), allows clients to access their data on a remote server.
XACML framework (Abou-Tair& Berlik 2007)
eXtensible access control markup language, is used to implement attributes-based on access control policies.
Data masking (Motiwalla, & Li 2010)
The method of creating inauthentic nonetheless structurally similar versions of data for testing and training purposes.
Random4 (Avireddy et al 2012)
Application specific encryption algorithm, which is used to prevent SQL (Structured Query Language) injection.
8
StarLight (Faulkner & Kritzstein; & Zimmerman, 2011)
This tool is used to gather information from different sources like visual intelligence, geospatial to alert staff on sea ports.
3DES (Islam & Islam, 2014)
Triple data encryption algorithm is a symmetric key block cipher, which applies DES (Data Encryption Standard) three times on same data.
VPN (Mengke & Xiaoguang & Jianqiu & Jianjian, 2016)
The virtual private network extends private network across the public network. Establishing virtual P2P connection through dedicated connections and traffic tunneling creates VPN (Virtual Private Network).
OBEX (Krishnan & Helberg & Merve, 2016)
The objective exchange is a communication protocol that helps the exchange of binary objects between devices.
MuteDB Architecture (Ferretti & Pierazzi &Colajanni & Marchetti, 2014)
This architecture devise incorporates data encryption, authentication, authorization and key management to assure confidentially of data in cloud.
IV RESULTS
This study has collected and analyzed data from different studies, and the results are displayed in the below tables.
The observations clearly show that, industries that have sensitive data from customers are trying to preserve the
privacy whereas, the industries, that have their computations in real time, are working hard to keep it as a secret.
Moreover, it is significant to say that most of solutions are based on encryption algorithms. Furthermore, in different
areas, as for example Finance and Health (role-based access control), can be applied as a solutions. In addition, most
seeable solutions are directed to protection the access of big data there is not must concern on the security aspect of
big data.
TABLE 3 – Security Issues in Big Data: Algorithms used in published papers Article Application Security Issue Algorithms Used Data Size Method Used
1
Krishnan, Helberg, Merve (2016)
Health and Finance
Privacy of customers’ and employees’ information
FPE (Format Preserving Encryption) algorithm that combines: one algorithm to encrypt, another algorithm to decrypt and one algorithm to
Dataguise This helps to decrease data breaches risk. Dataguise is created to detect, protect and also handle compliance to regulatory mandates
9
sample
2
Moura et. al. (2016)
Cloud computing
Users’ privacy and management of big data
Functional encryption algorithm consisting of Key Generation, Encryption, Decryption, and Evaluation
Homomorph ic encryption
Keeps private information secure. This forms encryption that allows certain computations to be executed cipher text and create encrypted results.
3
Swarna et. al. (2016)
Cloud environment
Privacy of transmitting and stored information
Ring signatures include only two algorithms: Sign and Verify.
Ring signature
This type of signature can be executed by any group member Ring signature uses PSA algorithm
4
Moura et. al. (2016)
Social network
Privacy of users’ information
Rendering algorithm
User rights management
Secure storage of information. The user identifies rights and limits the content; all users are registered securely
5
Xiaoguang et al. (2016)
Local area networks
Interface security
VPN encryption
-----
Background linkage for managers including remote lock, data wiping, and automatic alarm.
6
Charishma et. al. (2015)
Business organizations
Privacy of company’s and employees’ information
Cryptographic algorithms
Analytic tool Splunk
Provides log management by taking and analyzing the logs using certain patterns.
7
Chandanke re (2015)
Cloud storage Privacy of stored information
Group signature algorithms consist of four algorithms: KeyGen, Sign, Verify, and Trace.
Dynamic encryption and group signature
Enables secure sharing of information Dynamic encryption enables transmitting encrypted data to group member by adding members to admin. Group
10
KenGen and Sign are randomized, Verify and Trace are deterministic.
signature enables disclosure of identity by admin in case of dispute.
8
Gang (2015)
Social network
Privacy of users K-anonymity algorithm
Anonymity protection
Anonymity protection is used to protect data that can include relationship, attributes and identity anonymities.
Usman (2015)
Business organizations
Privacy of company’s and employees’ information
symmetric encryption algorithm
AES encryption
This type of encryption makes data unreadable for attackers. It contains operations, including substitutions and permutations.
9
Gang (2015)
Multimedia Privacy of transmitting media files
Watermarking detection algorithms (cryptographic algorithms)
Data Watermarkin g
“Data Watermarking” is applied to protect Copyrighting “Data Watermarking” relates to the information identification that is inserted imperceptibly.
10
Kaur et.al. (2015)
E-Commerce Privacy of customers’ information
Rivest ShamirAdleman (RSA) algorithm
A new Ebanking security system
In the new system activities and functions are s This new system is grounded on a strong access control.
11
11
Jyothirmai et. al. (2015)
HealthCare Privacy of patients’ information
access control algorithms (cryptographic algorithms)
Role-Based Access Control
Role-Based Access Control is a tool for managing access of data and making data safe Role-Based Access Control is able to manage different policies of access control which is grounded on role hierarchies.
12
Krishnan et al. (2015)
Mobile phone Intrusion detection in Mobile phones
BLA, Bluetooth object exchange (OBEX) Protocol
Covers 60% of Bluetooth market
Using Bluetooth logging agent, and also using database rules to authenticate.
13
Gang (2015)
Commercial companies
Privacy of company’s and employees’ information
Access control algorithms (cryptographic algorithms)
Role-based access control
Roles are able to be generated and behavior of each user can be checked. It is grounded on authorization of “Users-Object” and role optimization.
14
Feng et al. (2015)
Healthcare Regional secure data process/collect to limit issues in future health care
Raspberry Pi All digital healthcare industry
Raspberry Pi is a pocket-sized computer used in forensic medicine, and forensic etymology.
15
Raji et al. (2015)
Social networking
Privacy challenges in online social networks (OSN)
P2P-ONS Architecture
Can include Facebook, Twitter, Messenger, etc.
Architecture composed of privacy enabled start-up for user’s social communication and adaptive replica for ensuring availability of shared data.
12
17
Gang (2015)
Business environment
Privacy of company’s and employees’ information
Clustering algorithm
Access control (risk adaptive)
Risk adaptive access control is appropriate when it is not clear which data is accessible for users. This method uses information theory and statistical methods to identify quantization algorithm.
18
Gang (2015)
Commercial organizations
Privacy of company’s and employees’ information
Provence graph algorithms
Data Provence
Data Provence is used through labelling, so it is able to check if the results are correct, to differentiate the data in the table or to update the data.
19
Gadepally et al. (2015)
Kepner et al. (2014)
Bioinformatics and social media
Privacy of patients’ information bioinformatics, personal information of users in social media
Graph algorithms that are created using associative arrays
Computing on masked data
Computations are allowed to be executed directly on masked data, as the authorized recipients are allowed to unmask data. CMD (Computing on Masked Data) includes methods of cryptographic encryption and associative arrays that represents big data
20
Wagh et. al. (2014)
Educational organizations
Privacy of education resources, and personal information of users, integrity
Digital signature: a randomized KeyGen algorithm, a randomized Sign algorithm, and a deterministic Verify algorithm.
Digital signature, data encryption, access control
Access control is achieved through verification by transmitting secure data using data encryption and digital signature.
21
Merkel (2014)
Health and Finance
Privacy of customer’s information
Mathematical algorithms
Bash tool It simplifies a complex process. It is used mostly as an intermediary
13
22
Hsu et al. (2014)
Group communicatio n in social media
Privacy of users’ information
Changed RSA algorithm, that relies on NP class
Group key transfer protocol
This protocol protects from attacks that decreases system implementation overhead. This protocol is based on LSSS (Large Scale Survey System and DH key agreement and does not have online KGC (Key Generator Center)
23 Pace (2014)
Scientific computing
Privacy of scientists’ works
ARC algorithm for storage pool
Data management
Data management simplifies data workflow and makes it secure. Data management assigns to pool service management.
24
Bertine et al. (2014)
Web application
Encrypted data is not secure
PPDM and PPDA Sheer amount of data in cloud
Using Cryptography to encrypted data.
25
Ferretti et al. (2014)
Cloud database services
Scalable solution to guarantee confidentially of information in the database.
MuteDB architecture
Cloud providers
Data encryption, authentication and authorization to form new MuteDB.
26
Huang et al. (2014)
Web application and Hybrid cloud service providers
Privacy of image data stored in public cloud
AES algorithm Image Data Dividing images to blocks and shuffle them, to make them unrecognizable.
27
Mehak (2014)
Cloud storage Privacy of stored information
Map Reducer Algorithms including Master Key Algorithm
Hadoop Enables processing of big data sets. It parallelizes processing of data across computers in a cluster
28
Mirarab et. al. (2014)
E-Commerce Privacy of customers’ information
Stenography: least significant bit algorithm (LSB)
Encryption, Steganograp hy
Encryption and Steganography makes E-Commerce more safe and secure. Encryption is executed though Elliptic Curve Cryptography. LSB (Least Significan Bit) Steganography is used for image compression. Syed et al. Cloud services Encryption of data Encryption for Potential to Combining cloud-based
14
29
(2014) including medical data, confidential defense records, etc.
in the database is not sophisticated enough to provide enhanced security.
frequent access node
enhance the security of all cloud providers.
technologies and attribute based on encryption for secure storage and retrieval.
30
Raghuwan si et al. (2014)
Cloud service providers
Privacy of data at rest in cloud
Multilevel identity encryption
Cloud vendors and consumers
By using encryption and verification services both at file and block storage level.
40
Li et al. (2014)
Google Drive, Dropbox, Amazon S3, SkyDrive, iCloud, gnyte, OneDrive, etc.
Privacy preserving data access to cloud
ORAM algorithm Geodistributed cloud sites
Using ORAM (Oblivious Random Access Memory) for load balancing thus revealing the access patterns
41
Tan et al. (2014)
Any cloudbased application e.g. one drive, cloud
Cloud security
Secured entry points
Terabytes.
Sensing attacks and alerting the user by intrusion detection system and data leak prevention system.
43
Islam et al. (2014)
Multimedia like email, music, and images
Dealing with structured data and unstructured data
Statistical learning algorithms
Infinite
Text analysis by filtering, clustering and classification, building security node.
44
Abawajy et al. (2014)
Robotics and control system
Malware detection
LIME classifier
Massive and expected to grow exponentiall y
A designer has to initialize a fourtier LIME (Large Iterative Multitier Ensemble Classifiers) classifier by specifying which ensemble meta-classifier will operate at the fourth tier.
45
46
Islam et al. (2014)
Financial system
Security to unstructured data
3DES
1200-1400 Exabytes
Digital certificate, using hash functions.
47
Tankard (2012)
Multi-silo environment
Privacy and law of data protection
Symmetric encryption key
Vormatric Vormatric manages data access control that combines storage
15
algorithms elements, policy management and data encryption
48
Kaplan (2012)
Business organizations
Privacy of company’s and employees’ information
Cryptographic algorithms
Encryption It protects the system from attacks. Encryption protects information by encoding messages.
15
Pramila et al. (2012)
Healthcare Current system to diagnose the patient has slight range and is not secure
Using location tracking technology, telediagnosis, Access using PKC
Covers the person suffering from Alzheimer's Disease
A new method proposed with the long-range outdoor environment with GPS (Global Positioning System) and fine-grained distributed data access control.
18
Viddy et al. (2012)
Web applications or websites
Web application is suffering security attack especially SQL injection attack
Random4 encryption algorithm
Have potential to provide security to web traffic
Using an encryption algorithm based on Randomization.
20
Faulkner et al. (2011)
Military Port has very serious security issues; they lack port specific security technologies to alert security personnel in case of any hazard or danger.
StarLight uses Visual intelligence, entity detection, and intrusion
Commercial and military ports
StarLight combines information from different sources and integrates text, geospatial and temporal data to alert security staff.
21
Motiwalla et al. (2010)
Healthcare
Privacy preserving for healthcare data.
Data masking
Spent $39.4 billion in 2008
Changing the data values by using noise perturbation, data aggregation, and data swapping.
22
Yun et al. (2010)
Database or Data Warehouse of any company
Information security structure for database processor
Following PlanDo-Check-Act cycle to implement nine principles established by OECD
Cover all organization having a database
Using a structure that complies with laws requirements and conflicts between consumer and database processor.
23
Chang et al. (2009)
Mobile Phones
RFID enabled credit cards lacks sophisticated computation mechanism for authentication.
A new RFID system based on mobile phones
Have Potential to expand it in credit card related computation s.
Proposed an efficient and secure mechanism using mobile devices like RFID (Radio-Frequency Identification) readers together with credit cards.
16
23
Deihn et al. (2007)
Any information system
Privacy in enterprises
Ontology-driven XACML Framework
Any amount of computation al data.
Using ontology to focus on generating access control policies to provide fine granular access on diverse data force.
23
Skinner et al. (2007)
Virtual community including gaming, and multimedia
Privacy issues in the virtual communities are ineffective and pose a threat to data integrity.
Hippocratic Security Method
Virtual world
Monitoring the use of personal information through Hippocratic database principle to enforce Hippocratic policies.
23
Dimitropo ulos (2007)
Healthcare Privacy of patients’ information
Probabilistic Matching algorithms that are made to match patients and records
Identificatio n Management System (IMS) and Master Patient Index (MPI)
IMS (Information Management Software) and MPI (Multiple Protocol Interface), PMA (Parallel Multithreaded Architecture) are concentrated on provider and patient identification making the service secure. IMS (Information Management Software) and MPI (Multiple Protocol Interface) are created for functioning within regional health organization or HIE (Health Information Exchange).
23
Agrawal & Srikant, (2000).
Enterprises with “Bring your own” environment
Privacy and data protection of employees’ information and company’s information
RSA (Rational Software Architect) algorithm public key Cryptographic algorithm applied by the issuing CA (Certificate Authority)
SonicWall SonicWalls contain email security, mobile access security and network security. It collects input from a lot of sensors and defends and informs against threats
After analyzing the Data from Table 3 the following conclusion can be drawn.
a. Healthcare: Preserving the privacy of data is a critical issue. Healthcare information should not be
disclose or retrieved easily or leaked by hackers as it is very sensitive information and may cause
17
17
financial detriment. To avoid this, there are some techniques used that include the anonymization of the
records by using a tool called MapReduce. Additionally, data masking, which is a unique computer
device that is called Raspberry Pi, and this devise collects data in a secure manner, by using location
tracking technology to treat patients. b. Web Application: Web applications suffer the majority of data breaches. The violations occur not only at
the backend nonetheless also during the data transmission and data collection. The best solution is to
assure and identify the owner, by encrypting data before the transmission, by encrypting the frequently
accessed nodes, securing the entry points by also using specific architectures for authentication,
authorization, and key management. c. Mobile Devices: In mobile devices, the first issue is to verify the user as this verification can be a
problem. Another fundamental problem is to secure the transaction from all devices. The possible
solutions includes using a concrete architecture like OBEX, to monitor the behavior of data and network
for detecting intrusions by using RFID-based mobile phones. d. Social Networking: The Privacy and access to shared data causes the majority of the problems. A two
way authentication of a user is one of the possible privacy preserving solutions that need to be done. e. Finance: This application area has been under the scanner, and the majority of attacks and threats are the
target for monetary benefits. Both privacy of sensitive information and data storage in cloud or databases
are at risk. Authentication, authorization of the user is one of the first challenges of underlying issues that
need immediate attention. Banks are using an intrusion detection system to detect threats, however it is
not safe enough. Various encryption algorithms are used for making computations and transactions
secure.
V DISCUSSION Big data is a fairly new concept in IT, and it is obvious that research in the area is not thorough enough
and more research need to done. However, there is an important gap in many articles suggesting that
research is bias towards traditional methods and Big Data is under-researched. Firstly, in most of articles
18
18
only one area is considered that makes research findings one-sided and incomplete. This disadvantage
refers to all used articles except article of Gang (2015) where different areas and different solutions for
security issues are represented in his study. For instance, such areas as social network, multimedia,
commercial organizations, companies, business environment and such solutions as anonymity protection,
data watermarking, data Provence, role-based access control and risk-adaptive access control are all
aspects that con be looked into by scholars. To draw more conclusive results, a wider range of
information needs to be considered in most articles. Moreover, there is lack of comparison of different
solutions and why or how they could be applies is another gap in these articles. The literature chosen for
this study also has gaps in the area of application as to how certain security measures could be applied.
Secondly, in most articles there is no detailed information about solutions including algorithms that are
used for particular problems, for example case studies would be valuable to understand and gain solutions
in a particular area. Exceptions are articles of Dimitropoulos (2007) and Kaur et.al. (2015). Thirdly, in
many articles security issues are represented in a general form without specific information related to a
certain problem of an area. In particular, most common security and privacy issues do not include details
about what privacy was meant and what kind of information needs to be protected. Technological
development with variety of benefits can also bring threats that can pose a danger and result in the breach
of privacy and if important information is made public by companies then there can be facing hefty fines.
. Big data is a new area referred to the vast amount of information that needs to be analyzed and stored in
order to eliminate confidentiality breached. There are many security issues in different areas of big data
and sensitive information needs to be protected. Though a big research in big data was done, this area is
still new, and more research is demanded because many questions need to find answers. A more recent
research needs to be conducted research related to security issues of data, which is a comparative analysis
of security issues in different areas of big data. Results clearly show that security issues are similar in
different areas, solutions can be the same in different areas and many solutions are grounded on
encryption algorithm. Moreover, protection of access is very significant in big data. To summarize this
study, this research has added knowledge about big data security issues and highlighted research gaps in
19
19
the area. However, this area is neglected and requires a continuing research covering different aspects of
big data.
V CONCLUSION In conclusion this chapter has analysis data obtained from 57 peer-reviewed scientific publications from 2007 to
2016. This study has highlighted certain gaps in the literature and addressed solutions for various privacy and
security issues in different areas of big data. The company-provider needs to ensure secure for a safer
infrastructure and protection of customers’ information and this data has to comply with confidentiality standards.
In some areas as Health and Finance solving, security issues is the key point of effective and successful work of
the company. Many different technologies are created to protect against securities issues, however, the existing
technology is not able to completely solve security issues and research in this area is continuing. Although a big
research has been done regarding big data issues, it is still a fairly new advancement in in IT and a lot of
questions and aspects of security problems of big data are not answered and covered enough. In particular, there is
lack of comparative analysis of both security issues in different areas, as to draw solutions by comparing and
contrasting studies and finding solutions for them. Therefore, this work is aimed at finding and comparing
important security issues in big data in different areas and also aimed to find solutions that solve security issues.
This analysis is also important in a sense of providing the grounds for further research and enriching existing
information about big data.
In order to address these gaps and highlight issues in regards to some security and privacy issues of big data,
tools and techniques, have been used to find possible answers to particular issues. Data has been categorized ,
and then the second step was to group them under different parameters. The revelation concluded that web
applications and financial institutes are dealing with security problems, and each problem is resolved in varying
ways. Social media and other industries dealing with sensitive information have individual privacy concerns,
which are treated a uniform approach. This research has addressed gaps in the literature by highlighting security
and privacy issues that big companies face with recent technological advancements in corporate societies. By
20
20
evaluating these gaps there may be some light shed on issues of big data and provide future researcher directions
to solve them.
REFERENCES
A Survey of Information Security in HealthCare Sector. (2015). Retrieved from http://www.ijera.com/special_issue/NCDATES/CSE/PART-3/CSE%20135-2935.pdf Abawajy, J. H., Kelarev, A., & Chowdhury, M. (2014). Large iterative multitier ensemble classifiers for security of big data. IEEE Transactions on Emerging Topics in Computing, 2(3), 352-363. Agrawal, R., & Srikant, R. (2000, May). Privacy-preserving data mining. InACM Sigmod Record (Vol. 29, No. 2, pp. 439-450). ACM. Avireddy, S., Perumal, V., Gowraj, N., Kannan, R. S., Thinakaran, P., Ganapthi, S., ... & Prabhu, S. (2012, June). Random4: an application specific randomized encryption algorithm to prevent SQL injection. In 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications (pp. 1327-1333). IEEE. Bertino, E., & Samanthula, B. K. (2014). Security with privacy-A research agenda. In Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), 2014 International Conference on (pp. 144153). IEEE. Chandankere, B. (2015). Secure Data Sharing in an Untrusted Cloud. Journal of Engineering Research and Applications, 5(8). Chang, A. Y., Tsai, D. R., Tsai, C. L., & Lin, Y. J. (2009, October). An improved certificate mechanism for transactions using radio frequency identification enabled mobile phone. In 43rd Annual 2009 International Carnahan Conference on Security Technology (pp. 36-40). IEEE. Charishma, P., & Venkatesh, K. (2015). Big Data Security Analytic Solution using Splunk. Journal of Engineering Research and Applications, 5(4) Choi, C. (2013). A new type of security chip guards against big data snooping. Scientific American, 309(6). Computing on Masked Data to improve the Security of Big Data. (2015). Retrieved from http://arxiv.org/pdf/1504.01287v1.pdf Computing on Masked Data: a High Performance Method for Improving Big Data Veracity. (2014). Retrieved from https://arxiv.org/ftp/arxiv/papers/1406/1406.5751.pdf DATAGUISE REVEALS FIVE BIG DATA SECURITY PITFALLS. (2015). Retrieved from http://search.proquest.com.ezproxy.csu.edu.au/docview/1667178648?OpenUrlRefId=info:xri/sid:primo&ac countid=10344 Dell's says its new solutions allow customers to tackle BYOD, big data and security concerns. (2013). Entertainment Close - Up, Retrieved from http://search.proquest.com.ezproxy.csu.edu.au/docview/1468158864?accountid=10344
21
21
Dhiah el Diehn, I., Berlik, S., & Kelter, U. (2007). Enforcing privacy by means of an ontology driven xacml framework. In Third International Symposium on Information Assurance and Security (pp. 279-284). IEEE. Faulkner, L. L. Kritzstein; B. P Zimmerman J. J (2011) Security infrastructure for commercial and military ports Pages: 1 – 6.Ocean 11- MST/IEEE Kona. Program Book Art. No 6107174 Feng, X., Onafeso, B., & Liu, E. (2016). Investigating Big Data Healthcare Security Issues with Raspberry Pi. In Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM), 2015 IEEE International Conference on (pp. 2329-2334). IEEE. Ferretti, L., Pierazzi, F., Colajanni, M., & Marchetti, M. (2014). Scalable architecture for multi-user encrypted SQL operations on cloud database services. IEEE Transactions on Cloud Computing, 2(4), 448-458. Gang, Z. (2015). Research on Privacy Protection in Big Data Environment. Journal of Engineering Research and Applications, 5(5). Goel, S., & Hong, Y. (2015). Security Challenges in Smart Grid Implementation. In Smart Grid Security (pp. 139). Springer London. Hsu, C., Zeng, B., & Zhang, M. (2014). A novel group key transfer for big data security. Applied Mathematics and Computation, 2014(249). Huang, X., & Du, X. (2014, April). Achieving big data privacy via hybrid cloud. In Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on (pp. 512-517). IEEE. Huang, X., & Du, X. (2014, April). Achieving big data privacy via hybrid cloud. In Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on (pp. 512-517). IEEE. Internet Security Flaws in the Age of Big Data. (2014). Retrieved from http://search.proquest.com.ezproxy.csu.edu.au/docview/1638234956?OpenUrlRefId=info:xri/sid:primo&ac countid=10344 Islam, M. R., & Islam, M. E. (2014, December). An approach to provide security to unstructured Big Data. In Software, Knowledge, Information Management and Applications (SKIMA), 2014 8th International Conference on (pp. 1-5). IEEE. Kaur, K., Dhindsa, K. S., & Singh, G. (2009, March). Numeric to Numeric Encryption of Databases: Using 3Kdec Algorithm. In Advance Computing Conference, 2009. IACC 2009. IEEE International (pp. 15011505). IEEE Kaur, K., Pathak, A., Kaur, P., & Kaur, K. (2015). E-Commerce Privacy and Security System. Journal of Engineering Research and Applications, 5(5). Kim, S., Kim., N., & Chung, T. (2013). Attribute Relationship Evaluation Methodology for Big Data Security. IT convergence and security. Retrieved from http://ieeexplore.ieee.org.ezproxy.csu.edu.au/stamp/stamp.jsp?tp=&arnumber=6717808 Kizza, J. M. (2015). Introduction to computer network vulnerabilities. InGuide to Computer Network Security (pp. 87-103). Springer London. Kuhn, D. R., Walsh, T. J., & Fries, S. (2005). Security considerations for voice over IP systems. NIST special publication, 800-58. Lee, W., Stolfo, S. J., Chan, P. K., Eskin, E., Fan, W., Miller, M., ... & Zhang, J. (2001). Real time data miningbased intrusion detection. In DARPA Information Survivability Conference & Exposition II, 2001. DISCEX'01. Proceedings (Vol. 1, pp. 89-100). IEEE. Li, P., & Guo, S. (2014). Load balancing for privacy-preserving access to big data in cloud. In Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on (pp. 524-528). IEEE.
22
22
LI, P., & GUO, S. (2014, APRIL). LOAD BALANCING FOR PRIVACY-PRESERVING ACCESS TO BIG DATA IN CLOUD. IN COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2014 IEEE CONFERENCE ON (PP. 524-528). IEEE. Li, Y., & Xiangsheng, L. (2010, October). Information security structure for database processer. In 2010 International Conference on Computer Application and System Modeling (ICCASM 2010) (Vol. 15, pp. V15-97). IEEE. Mehak, G. (2014). Improving Data Storage Security in Cloud using Hadoop. Journal of Engineering Research and Applications, 4(9) Mirarab, A., & Kenari, A. (2014). A New Framework for Secure M-Commerce. Journal of Engineering Research and Applications, 4(11). Montlick, T. F. (1996). U.S. Patent No. 5,561,446. Washington, DC: U.S. Patent and Trademark Office. Montlick, T. F. (1996). U.S. Patent No. 5,561,446. Washington, DC: U.S. Patent and Trademark Office. Motiwalla, L., & Li, X. (2010, July). Value added privacy services for healthcare data. In 2010 6th World Congress on Services (pp. 64-71). IEEE. Motiwalla, L., & Li, X. (2010, July). Value added privacy services for healthcare data. In 2010 6th World Congress on Services (pp. 64-71). IEEE. Pace, A. (2014). TECHNOLOGIES FOR LARGE DATA MANAGEMENT IN SCIENTIFIC COMPUTING. International Journal of Modern Physics C: Computational Physics & Physical Computation, 25(2). Pham, D. V., Halgamuge, M. N., Syed, A., & Mendis, P. (2010, July). Optimizing windows security features to block malware and hack tools on USB storage devices. In Progress in electromagnetics research symposium. Pham, D. V., Syed, A., Mohammad, A., & Halgamuge, M. N. (2010). Threat analysis of portable hack tools from USB storage devices and protection solutions. In Information and Emerging Technologies (ICIET), 2010 International Conference on (pp. 1-5). IEEE. Pramila, R. S., Nargunam, A. S., & Affairs, A. (2012, March). A study on data confidentiality in early detection of Alzheimer's disease. In Computing, Electronics and Electrical Technologies (ICCEET), 2012 International Conference on (pp. 1004-1008). IEEE. Privacy and Security Solutions for Interoperable Health Information Exchange: Final Implementation Plans. (2007). Retrieved from https://healthit.ahrq.gov/sites/default/files/docs/page/FIP_0.pdf Raghuwanshi, D. S., & Rajagopalan, M. R. (2014, January). MS2: Practical data privacy and security framework for data at rest in cloud. In Computer Applications and Information Systems (WCCAIS), 2014 World Congress on(pp. 1-8). IEEE. Security and Privacy Issues of Big Data. (2016). Retrieved from https://arxiv.org/ftp/arxiv/papers/1601/1601.06206.pdf Skinner, G., Chang, E., McMahon, M., Aisbett, J., & Miller, M. (2004, November). Shield privacy Hippocratic security method for virtual community. In Industrial Electronics Society, 2004. IECON 2004. 30th Annual Conference of IEEE (Vol. 1, pp. 472-479). IEEE. Stouffer, K., Falco, J., & Scarfone, K. (2011). Guide to industrial control systems (ICS) security. NIST special publication, 800(82), 16-16. Swarna, S., & Maryam. S. (2016). Increasing Security Level in Data Sharing Using Ring Signature in Cloud Environment. Journal of Engineering Research and Applications, 6(2) Syed, S., & Teja, P. S. (2014, November). Novel data storage and retrieval in cloud database by using frequent access node encryption. In Contemporary Computing and Informatics (IC3I), 2014 International Conference on (pp. 353-356). IEEE.
23
23
Tan, Z., Nagar, U. T., He, X., Nanda, P., Liu, R. P., Wang, S., & Hu, J. (2014). Enhancing big data security with collaborative intrusion detection.IEEE cloud computing, 1(3), 27-33. Tankard, C. (2012). Big data security. Network Security, 2012(7). The 5 Methodology Milestones for Big Data. (2015). Retrieved from https://icrunchdatanews.com/5methodology-milestones-big-data/ THE BIG PICTURE. (2012). Retrieved from http://search.proquest.com.ezproxy.csu.edu.au/docview/1011329843?rfr_id=info%3Axri%2Fsid%3Aprimo Usman, I. (2015). The Risk and Challenges of Cloud Computing. Journal of Engineering Research and Applications, 5(12). Vashist, R. (2015). Cloud Computing Infrastructure for Massive Data: A Gigantic Task Ahead. In Big Data in Complex Systems (pp. 1-28). Springer International Publishing. Wagh, K., Jathar, R., Bangar, S., & Bhakthadas, A. (2014). Securing Data Transfer in Cloud Environment. Journal of Engineering Research and Applications, 4(5) YANG, M., ZHOU, X., ZENG, J., & XU, J. (2016). Challenges and Solutions of Information Security Issues in the Age of Big Data. Volume 3 pages: 139-202. Zhang, X., Dou, W., Pei, J., Nepal, S., Yang, C., Liu, C., & Chen, J. (2015). Proximity-aware local-recoding anonymization with mapreduce for scalable big data privacy preservation in cloud. IEEE Transactions on Computers,64(8), 2293-2307. Zhang, X., Dou, W., Pei, J., Nepal, S., Yang, C., Liu, C., & Chen, J. (2015). Proximity-aware local-recoding anonymization with mapreduce for scalable big data privacy preservation in cloud. IEEE Transactions on Computers,64(8), 2293-2307. Zissis, D., & Lekkas, D. (2012). Addressing cloud computing security issues. Future Generation computer systems, 28(3), 583-592.