Distributed filesystem forensics: XtreemFS as a case study
Ben Martini*, Kim-Kwang Raymond Choo
University of South Australia, GPO Box 2471, Adelaide, SA 5001, Australia
a r t i c l e i n f o
Article history:
Received 6 May 2014
Received in revised form 26 August 2014
Accepted 29 August 2014
Available online 11 October 2014
Keywords:
Big data
Digital forensics
Distributed filesystem
Infrastructure as a Service (IaaS)
Storage as a Service (StaaS)
Distributed filesystem forensics
Cloud storage forensics
a b s t r a c t
Distributed filesystems provide a cost-effective means of storing high-volume, velocity and
variety information in cloud computing, big data and other contemporary systems. These
technologies have the potential to be exploited for illegal purposes, which highlights the
need for digital forensic investigations. However, there have been few papers published in
the area of distributed filesystem forensics. In this paper, we aim to address this gap in
knowledge. Using our previously published cloud forensic framework as the underlying
basis, we conduct an in-depth forensic experiment on XtreemFS, a Contrail EU-funded
project, as a case study for distributed filesystem forensics. We discuss the technical and
process issues regarding collection of evidential data from distributed filesystems,
particularly when used in cloud computing environments. A number of digital forensic
artefacts are also discussed. We then propose a process for the collection of evidential data
from distributed filesystems.
© 2014 Elsevier Ltd. All rights reserved.
Introduction
In recent years, the amount of data captured, stored and
disseminated in electronic only form has increased exponentially (see Beath et al., 2012) and unsurprisingly, big
data has constantly been ranked as one of the top ten
technology trends (see Casonato et al., 2013; Chua, 2013)
including by the United States National Intelligence Council
(2012). A Gartner report, for example, has forecasted that
“big data will generate [US]$232 billion in revenue cumulatively from 2011 to 2016” (Casonato et al., 2013, p. 4). A
widely accepted definition of big data is from Gartner,
which defines it as “high-volume, -velocity and -variety
information assets that demand cost-effective, innovative
forms of information processing for enhanced insight and
decision making” (Beyer and Laney, 2012, p. 2).
There are, however, large technology overheads and
significant costs associated with the processing, storage
and dissemination of big data. Businesses and governments
will continue to be under pressure to deliver more with less
especially in today's economic landscape. Business and
government users have recognised the advantages of cloud
computing for processing and storing big data. For
example, a report by the Info-communications
Development Authority of Singapore (2012, p. 6), pointed
out that “[e]arly adopters of Big Data on the cloud would be
users deploying Hadoop clusters on the highly scalable and
elastic environments provided by Infrastructure-as-aService (IaaS) providers such as Amazon Web Services
and Rackspace, for test and development, and analysis of
existing datasets”. This is unsurprising as cloud computing
offers users the capacity, scalability, resilience, efficiency
and availability required to work with high-volume, velocity and variety information.
Cloud computing (like other networked cyberinfrastructure) is subject to criminal exploitation (Choo,
2010; Chonka and Abawajy, 2012; Patel et al., 2013). In a
digital investigation, one would need to gather evidence of
an incident or crime that has involved electronic devices
(e.g. computer systems and their associated networks) e a
process known as digital forensics. The latter is increasingly
being used in the courts in Australia and overseas. Many
* Corresponding author.
E-mail address: [email protected] (B. Martini).
Contents lists available at ScienceDirect
Digital Investigation
journal homepage: www.elsevier.com/locate/diin
http://dx.doi.org/10.1016/j.diin.2014.08.002
1742-2876/© 2014 Elsevier Ltd. All rights reserved.
Digital Investigation 11 (2014) 295e313conventional forensic tools have focused upon having
physical access to the media that stores the data of potential interest. However, in a cloud computing environment it
is often not possible or feasible to access the physical media
that stores the user's data (Martini and Choo, 2012).
Distributed filesystems can support cloud computing environments by providing data fragmentation and distribution, potentially across the globe and within numerous
datacentres. This presents significant technical and jurisdictional challenges in the identification and seizure of
evidential data by law enforcement and national security
agencies in criminal investigations (Hooper et al., 2013) as
well as by businesses in civil litigation matters.
A number of researchers and practitioners have
emphasised the need for cloud computing specific digital
forensics guidelines (Birk and Wegener, 2011; National
Institute of Standards and Technology, 2011; Zatyko and
Bay, 2012), and we believe this need extends to the underlying infrastructure which supports cloud computing.
While a number of published papers have provided a sound
grounding for the research required in cloud forensics by
highlighting the issues for digital forensic researchers and
practitioners (Birk and Wegener, 2011; Martini and Choo,
2012) there are relatively few technical papers discussing
the forensic collection of evidential data from cloud servers
or underlying supporting infrastructure such as distributed
filesystems.
Much of the existing literature has focused on the
Software as a Service (SaaS) component of cloud computing
(e.g. Dropbox, Skydrive and Google Drive) (Marty, 2011;
Chung et al., 2012; Dykstra and Sherman, 2013; Hale,
2013; Martini and Choo, 2013; Federici, 2014; Quick et al.,
2014) rather than the Infrastructure as a Service (IaaS)
implementations that supports these services. Researchers
such as Dykstra and Riehl (2013) and Hay et al. (2011)
identified the various legal and technical challenges in
conducting forensic investigation of cloud IaaS; but there
has been no prevalent in-depth forensic investigation of a
distributed filesystem which is used or analogous to those
used in public or private cloud installations. One reason this
may not have been thoroughly explored is due to the difficulties in accessing (for research purposes) a public cloud
computing environment of significant scale that makes use
of a distributed storage environment.
In recent years, a number of researchers have examined distributed filesystems and the implicit issues for
forensic investigations on these complex systems. Cho
et al. (2012) conducted a preliminary study of Hadoop's
distributed filesystem (HDFS). Hegarty et al. (2011)
discuss a technique for distributed signature detection
for the purpose of detecting the file signatures of illicit
files in distributed filesystems. They note that existing
signature techniques are unlikely to perform well in a
distributed filesystem environment with a significant
quantity of data stored. Almulla et al. (2013, p. 3) discuss
a range of cloud forensic issues including the underlying
role of ‘distributed computing’ and, in turn, distributed
filesystems. The authors note the significant impact
distributed filesystems have on forensics in terms of the
requirement for a practitioner to ‘rebuild files from a
range of filesystems’.
In this paper, we use XtreemFS as a case study to provide
a better understanding of both the technical and process
issues regarding collection of evidential data from distributed filesystems which are commonly used in cloud
computing environments. XtreemFS, currently funded by
the Contrail EU project (Contrail, 2013), is an open source
example of a general purpose and fault-tolerant distributed
and replicated filesystem that can be deployed for cloud
and grid infrastructures to support big data initiatives
(XtreemFS, 2013). To provide fault-tolerant file replication,
the stored file data is generally split and replicated across
multiple storage servers. In a cloud deployment, the data is
also likely to be extensively distributed at the physical level
within datacentres.
We chose to focus on a single distributed filesystem as
this allows us to conduct an in-depth analysis of the client
and, particularly, the servers to fully understand the potential evidential data that can be collected as part of a
forensic investigation. XtreemFS (like most underlying
infrastructure) does not receive substantial attention in
mainstream technical media. However it has received significant attention in the academic community with many
researchers choosing to analyse it or implement it as the
underlying infrastructure in larger projects. Most
commonly, in the literature, XtreemFS is implemented in
cloud computing or grid computing (commonly understood to be one of the predecessors of cloud). For example
Kielmann et al. (2010) describe the role of XtreemFS in
supporting XtreemOS and its suitability to integrate with
IaaS services. Pierre and Stratan (2012) integrate XtreemFS
into their proposed ‘ConPaaS’ system which as the name
describes is a Platform as a Service cloud environment.
Enke et al. (2012) also implement XtreemFS (including a
number of its advanced features) for the purpose of managing cloud data replication in their work to analyse
distributed big datasets in Astronomy and Astrophysics.
Krüger et al. (2014) note that XtreemFS has also been used
to provide distributed data management in the MoSGrid
science gateway (Molecular Simulation Grid), an EUfunded project. Kleineweber et al. (2014) selected
XtreemFS for the underlying filesystem into which they
integrated their reservation scheduler for object based filesystems (as an extension) to handle storage QoS in cloud
environments. In addition to implementing or extending
XtreemFS other researchers such as Dukaric and Juric
(2013) and Petcu et al. (2013) have also noted its use
when discussing filesystems generally in the cloud environment. This body of research demonstrates the
contemporary applicability of XtreemFS in the cloud environment and, as such, makes it an appropriate choice as a
case study for forensic investigation in this paper.
Cloud forensics framework
The digital forensics framework used in this paper is
based on our previously published work (Martini and Choo,
2012), which we have previously validated using ownCloud
(Martini and Choo, 2013; Quick et al., 2014). The framework
is based upon the stages outlined by McKemmish (1999)
and the National Institute of Standards and Technology
(Kent et al., 2006) but differs in a number of significant
296 B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313ways. The iterative nature of this framework is integral to a
successful investigation in a complex client/server environment as presented in XtreemFS. The client can be used
to identify the existence of cloud services and to collect any
data stored by the client. Hence, forensic analysis of the
client is generally carried out before analysis of the server
environment. The following four stages outline the high
level process and order that a forensic practitioner should
follow when conducting forensic investigations in the
cloud computing environment.
1. Evidence Source Identification and Preservation: This
phase is concerned with identifying sources of evidence in a digital forensics investigation. During the
first iteration, sources of evidence identified will
generally be via a physical device (e.g. desktop computers, laptops and mobile devices) in possession of the
suspect. However, in the case of a distributed filesystem
used in cloud computing, the filesystem client may only
exist on the cloud server nodes. This, however, does not
prevent it from being the first point of identification
and may lead to other components of the filesystem.
During the second iteration, this phase is concerned
with identifying other components of the environment
or cloud which may be relevant to the case, possible
evidence stored by the filesystem custodian (e.g. system administrator) and processes for preservation of
this potential evidence. Preservation is integral to the
integrity of forensic investigations and as such proper
preservation techniques must be maintained regardless of the evidence source.
2. Collection: This phase is concerned with the actual
capture of the data. There are various methods of
evidential data collection suited for the various cloud
computing platforms and deployment models. While
IaaS may result in the collection of virtual disks and
memory, and SaaS may result in an export from the
relevant cloud software, the collection of distributed
filesystems supporting cloud computing installations
may be considerably more involved. Another consideration for distributed filesystems is the likelihood of
remote hosting. If the filesystem is hosted outside of
jurisdiction of the investigating LEA, they should use the
appropriate legal instrument to legally gain access to the
filesystem remotely.
3. Examination and Analysis: This phase is concerned with
the examination and analysis of forensic data. Examination and analysis are key components of a forensic
investigation dealing with distributed filesystems. Examination will be integral to gaining a complete understanding of the operating components in the filesystem,
and analysis will be integral to reconstruction.
4. Reporting and Presentation: This phase is concerned
with legal presentation of the evidence collected. This
phase remains very similar to the frameworks of
McKemmish and NIST (Martini and Choo, 2012). In
general, the report should include information on all
processes, the tools and applications used and any limitations to prevent false conclusions from being reached
(see US NIJ, 2004).
Contributions and outline
We regard the contributions of this paper to be threefold:
1. Provide technical insights on forensic analysis of the
XtreemFS underlying infrastructure and IaaS instances;
2. Propose processes for the collection of electronic evidence from XtreemFS (see Fig. 1 in Summary section)
and distributed filesystems used in the cloud computing
environment based on the technical findings from the
previous contribution (see Fig. 2 in Data storage section);
and finally
3. Validate our published cloud forensic framework
(Martini and Choo, 2012).
In the next section, we provide an overview of the
XtreemFS architecture and discuss the role of the various
components. The findings are discussed in the context of
these components. We follow the first three stages of the
cloud forensic framework (see Cloud forensics framework
section) to outline the recommended process for collection of forensic artefacts and potential evidence. The
Collecting evidence from a distributed filesystem e a
process section presents our proposed high level process
for collection of electronic evidence from distributed filesystems used in cloud environments. The last section
concludes this paper.
Experiment environment
XtreemFS architecture overview
XtreemFS is a virtual network-provisioned filesystem,
which is used to deliver backend storage services for a
cloud service provider by providing key services such as
replication and striping. It is one example of a number of
products available with similar feature sets (other examples include GlusterFS (Gluster, 2014), BeeGFS (Fraunhofer,
2014) and Ceph (Ceph, 2014)). It is important to make the
distinction between backend and frontend storage systems
in the cloud computing environment as both are
commonplace. For the purposes of this research, frontend
cloud storage systems refers to cloud storage that is purchased by users to store their personal files (popular examples include Dropbox, Skydrive and Google Drive)
whereas backend cloud storage systems are used by the
cloud provider to support IaaS, PaaS or SaaS services. For
example a backend storage system would be used to store
the virtual machines that are hosted as part of an IaaS cloud
or the databases and other files used by a SaaS system. In
most cases backend storage is not provided directly to users
however it is acknowledged that in some IaaS instances
users may access this type of storage directly.
Two major features provided by XtreemFS are striped
and replicated filesystem services that is achieved using
three main components, namely the Directory Service
(DIR), the Metadata and Replica Catalog(s) (MRC) and the
Object Storage Device(s) (OSD) (Stender et al., 2012). These
components work together to provide the virtual
B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313 297filesystem to network users, and the various components
communicate between each other and with clients as
described below.
The DIR is responsible for maintaining a registry of all
services and volumes provided by the XtreemFS service
(Stender et al., 2012). In this respect all other parts of the
XtreemFS architecture (including clients) regularly
communicate status and other information to the DIR service. This indicates a possible source of forensic identification information about an XtreemFS instance, which may
include the (virtual) location of the OSD storage on a local
or remote network and the location and number of clients
connected to the environment.
The MRC is responsible for storing and managing all of
the metadata that XtreemFS generates and stores regarding
the file content. Stender et al. (2012, p. 271) define metadata as “a collective term for all types of data that need to
be managed by a filesystem except for file content”. Metadata forms a critical part of many forensic and civil litigation investigations (see Irwin v. Onondaga County
Resource Recovery. 72 AD3d 314 e NY: Appellate Div., 4th
Dept. 2010; Ellis v. Toshiba America Information Systems,
Inc., Cal: Court of Appeal, 2nd Appellate Dist., 1st Div. 2013;
Voltage Pictures, LLC V. Does 1-31, Dist. Court, SD Georgia
2013), and is reportedly collected by the National Security
Agency as part of the agency's PRISM program (United
States Foreign Intelligence Surveillance Court, 2011;
Gellman and Lindeman, 2013; Greenwald and Ball, 2013).
As such, the MRC could be a potential source of forensic
identification data and data for examination and analysis.
Consideration must also be given to the preservation of the
metadata stored on the MRC.
The OSD is responsible for storing the actual file data
sent by the clients in the XtreemFS instance. A variety of
structures are used to store the data depending on the
striping, replication, and other options selected on a per
volume (or per file) basis; and this is discussed further in
the findings section of the paper. The OSD would likely be
the focal component of the XtreemFS system for a forensic
practitioner as it stores the file content data that a client has
added to the virtual filesystem. This makes the OSD a key
component for evidence preservation (to ensure that the
data stored by an individual of interest is not modified
intentionally or unintentionally) and forensic analysis.
Having more than one OSD in an XtreemFS instance would
be commonplace, in fact it is mandatory to make use of
some of the key features of XtreemFS (e.g. replication and
striping). An XtreemFS instance with multiple OSDs allows
for file content to be split and replicated across a number of
physical devices potentially in a number of different
physical locations.
XtreemFS uses the concept of ‘volumes’ to virtually
segregate data. Volumes can be used in a number of ways,
for example, permissions and default policies (e.g. replication and striping) can be applied to volumes. Volumes
are also the primary administrative unit for a client, for
example, clients mount volumes, set policies on volumes
and create snapshots of volumes, etc. An individual volume could conceivably be created for each user of the
XtreemFS system, alternatively the POSIX permissions
system integrated into XtreemFS could be used to
segregate users. We believe that a common implementation may be to leave authentication and permissions management to higher level applications. For
example, a file sync application using XtreemFS as a
backend could use a single volume, with each file owned
by a single service user relying on the file sync application server to enforce permissions and provide
authentication.
Finally the XtreemFS client is used to connect to the
XtreemFS filesystem. The client (and related tools) is
responsible for all administrative operations and for facilitating local access to the virtual filesystem. Administrative
operations include creating volumes, mounting volumes,
setting volume and file policies, and checking status (e.g.
replication and striping) of files and volumes. While it is not
expected that there will be a wealth of data of forensic
interest available on a host with a disconnected client, a
connected client may provide a forensic practitioner with
valuable identification information (see XtreemFS Client
section) and potentially logical collection opportunities.
Experiment environment overview
The experiment environment consists of a number of
virtual machines providing various XtreemFS features (e.g.
DIR, MRC, OSD and client) depending on the specific
experiment. The virtual machines (VMs) simplified forensic
disk image collection and would simulate a cloud configuration where physical hosts house both distributed
compute and storage functions. Clients were also hosted as
VMs.
A number of client and server VMs were created using
CentOS 6.4 as their operating system and ext4 for their local
filesystem. We used the prebuilt XtreemFS packages built
for the CentOS 6 operating system.
Findings
This section will discuss the three main architectural
components of the XtreemFS system and the client in the
context of the relevant phases of the cloud forensics
framework with a view to understanding the filesystem
and data of forensic interest available from the XtreemFS
system, ultimately producing a list of high-level artefacts
that should be investigated in all distributed filesystem
forensic investigations. As XtreemFS is an advanced and
complex environment, it is beyond the scope of a single
paper to cover every possible combination of its potentially
enabled features. Hence, we focus on the features most
relevant to a forensic researcher or practitioner.
Directory service (DIR)
The DIR stores the data needed to define and locate the
various technical components in an XtreemFS instance. For
a practitioner commencing an investigation with an identified XtreemFS instance, this would be the logical starting
point to determine the components and extent of the
XtreemFS installation.
Three artefact types of potential value to a forensic
investigation exist on the DIR server:
298 B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313 Volatile environment metadata e As the directory
service is responsible for maintaining a record of the
various components in the XtreemFS environment, a
range of environmental metadata of interest should
exist. This can include the logical network location
(generally an IP address) of the various filesystem nodes
and unique identifiers for those nodes. Other data of
interest (where available) includes data about the individual nodes such as node type/responsibilities,
ownership information and configuration (especially in
terms of authentication).
Non-volatile environment metadata e While in many
cases the DIR may store the majority of directory data in
volatile storage as it is expected to change (semi)
frequently, some data may be committed to non-volatile
storage (e.g. disk). This includes all of the metadata listed above. The most prevalent example would be logging data but may also include backups for example.
Configuration files e Configuration files can be an
invaluable source of information for a forensic
researcher and practitioner seeking to gain a better
understanding of the operation of the system with a
view to collecting evidence from the individual components. Configuration information of interest may
include network information (addresses/ports used,
etc.), authentication information (systems, passwords,
secrets, etc.) and operational information (local storage
locations, database formats, etc.).
Evidence source identification and preservation
The most obvious use of the DIR component would be as
part of identification and assisting in preservation of the
distributed storage data. Within this context we are seeking
to find and decode any available data on the components of
the XtreemFS instance being investigated.
Before the value of the DIR component can be assessed it
must be located. The most obvious method of locating the
DIR server would be to analyse the currently mounted filesystem on a client (discussed in XtreemFS Client section). If
this is infeasible (and the practitioner has access to the LAN
segment where the DIR server is assumed to be hosted) the
practitioner could potentially exploit the DIR auto detection
feature available to the XtreemFS service components. The
XtreemFS installation documentation notes that MRC and
OSD components can “autodiscover” the DIR using a UDP
broadcast. This was tested by enabling the UDP autodiscover feature on an OSD in the experiment environment
and capturing the resulting packets. It was noted that the
OSD sent a UDP packet to the broadcast address of the LAN
on the DIR service port (32638) to which the DIR responded. While this functionality can be optionally disabled in
the DIR configuration, it is enabled by default and as such
replicating this UDP broadcast is a feasible method for a
practitioner to locate the DIR server(s) on a LAN segment.
This was tested successfully as part of our experiments. We
took this concept further and sent the UDP packet contents
captured from a DIR discovery to the broadcast address on
the OSD service port (32640). This resulted in all OSDs on
the LAN segment responding with an error message. A
practitioner could make use of this method (or a similar
derivation) to locate all of the XtreemFS OSDs in a LAN
segment where other forms of network discovery have been
intentionally or unintentionally disabled.
Once the DIR has been located and the practitioner has
gained access to the operating system (with the assistance of an administrator or using standard digital forensics methods), the practitioner should focus on
understanding the configuration of the DIR in this
installation. The DIR service has a plaintext configuration
file located (by default) at ‘/etc/xos/xtreemfs/dirconfig.-
properties’. This configuration file contains a range of
directives for the DIR service. These include the capacity
to enable debug logging, configuration of the service port
(for communication with other components of the
XtreemFS system) and the HTTP port (for accessing
administrative status information), directives to enable
SSL communication (disabled by default), monitoring
configuration (email and SNMP) and ‘BabuDB’ (XtreemFS
database) configuration directives. The configuration file
also contains the DIR service UUID, extensive use is made
of UUIDs in the XtreemFS system with most DIR
addressable items being assigned a UUID.
All XtreemFS services provide a HTTP service to present
system administrators with status information. The default
port for the DIR HTTP status service is 30638. By default,
the status service can be accessed without authentication.
However, if an administrator password is enabled in the
configuration file, it will be required to access the status
page. When loaded in a web browser, the HTTP status
service provides a range of information that could be of
interest to a forensic practitioner, this includes the
following:
Address mapping: IP address to universally unique
identifier (UUID) mapping for each network accessible
service in the XtreemFS installation. This provides two
methods for a forensic practitioner to ensure they preserve and collect all services (generally on different
physical devices) of interest, these being IP address (for
provider and device location) and UUID (which can be
confirmed using the service configuration file once access has been arranged to the host).
Service registry: A range of registry information for each
component listed with the directory service. Generic
mappings in this category include the service type (e.g.
‘SERVICE_TYPE_OSD’ or ‘SERVICE_TYPE_VOLUME’),
name (generally the service type and UUID, volume
name for volumes), static_status (online, locked, etc.)
and last updated (in UNIX timestamp and converted to
local timezone). Specific services (MRC, OSD, volumes,
etc.) also have specific entries in the registry such as free
disk space and total/used RAM and load. For a forensic
practitioner the focus in the service registry is to
determine what each UUID referenced throughout the
system refers to (e.g. a specific volume, an OSD or an
MRC, etc.) through the use of the type and UUID (or
name) entries.
Configurations: Each XtreemFS service listed with the
DIR has a range of configuration details listed against its
UUID in the configurations table. These can include the
ports on which the service itself and HTTP monitoring
can be accessed which may be useful if a practitioner is
B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313 299considering accessing the service via the network. There
is also an indication if SSL is enabled on the service. One
directive of particular interest in the MRC configuration
entry is “authentication_provider” that notes the type of
authentication enabled on the MRC. By default the
“NullAuthenticationProvider” is used that makes use of
local users and groups on the clients. However, other
authentication schemes can be enabled which could
complicate a logical extraction using the XtreemFS
client.
Two of the most interesting OSD configuration entries
include the “storage_layout” and the “object_dir” that
indicate the layout and location on host storage of the
objects stored with the individual OSD instance
respectively. With this information a practitioner can
begin to plan for collection from the OSD (if the practitioner is seeking to undertake a physical collection).
This is discussed further as part of the high-level evidence collection process (see Collecting evidence from a
distributed filesystem e a process section).
Network monitoring (packet collection) the services
port is another potential source of data for a forensic
practitioner in terms of monitoring the hosts connecting
(or attempting to connect) to the XtreemFS services.
Collection & examination and analysis
In most cases a practitioner would seek to collect the
environment directory metadata held by the DIR service for
use in identifying other components of the XtreemFS
installation or for rebuilding parts of the installation in a
post event laboratory situation where required. Examples
of this data are outlined in Evidence source identification
and preservation section. Collecting the data via the HTTP
status interface discussed as part of identification was, in
our experience, one of the most straightforward and
comprehensive methods of collecting the current state of
the XtreemFS instance directory. However, it is only
possible to collect this data from the running XtreemFS DIR
server that may not always be available to a forensic
practitioner who, for example, may be relying on offline
physical/logical copies of the DIR server.
The XtreemFS DIR service stores some metadata in a
“BabuDB” database (a key pair based database system
closely linked with XtreemFS and the XtreemFS developers). This database keeps a binary log which could be
of use if a practitioner is working with an offline copy of the
DIR service or potentially looking for historical data on the
components which were part of the XtreemFS instance but
have since been removed. Unfortunately examination and
analysis of the databases binary log is not a straightforward
process for the DIR service, and we tried different methods,
such as rebuilding the database using the BabuDB Java library and raw keyword searches, with varying levels of
success as discussed below.
The database files for the DIR service are located by
default in “/var/lib/xtreemfs/dir/” where there are two
subdirectories named “database” and “db-log”. When
collecting evidence from a DIR server, the “/var/lib/
xtreemfs/dir/” directory and all subdirectories and files
should be collected. The “database” directory contains a
“config.db” file that appears to store configuration information for the DIR databases and two subdirectories
named “dirdb” and “dirdbver”. These subdirectories are
the names for the two databases used by the DIR service,
and in our experiments they did not store any persistent
files. However, the name of the database will be needed if
the practitioner chooses to use the BabuDB Java library to
examine the database. The “db-log” directory generally
contains a number of files (the specific number depending
on the length of time the XtreemFS instance has been
operating and configuration settings) which appear to be
snapshots of the database binary log using the extension
“.dbl” with a filename of the format “1.nnnn” where n is
any number of integers presumably a type of sequence
number. BabuDB can recreate the last state of the database
using these snapshots, which we attempted using the
BabuDB JAR library with some success. Usage examples for
the BabuDB library, which can be adapted for the purposes
of reconstruction, are provided by the developers
(BabuDB, 2012). This provided access to the UUID and IP
address of the MRC and OSD components of the XtreemFS
instance in its state before the collection. Another method
of analysis a practitioner could undertake on the databases
binary log files is a raw keyword search (for IP addresses)
on the files using a forensic tool such as Guidance Software's EnCase. This would allow the practitioner to
potentially locate all IP addresses associated with DIR listed components for the life of the database logs. We tested
this method with success and found that the service type
(e.g. MRC, OSD) could often be decoded by the associated
port number.
The DIR server should have its system time recorded
(particularly any offset from real time) as this may be
needed later to verify/explain the timestamps recorded by
the system. This record should also be made for any other
XtreemFS system component examined.
The DIR component stores an operations log at “/var/
log/xtreemfs/dir.log” which should be collected and analysed. In our experiments, this log did not contain any
entries which we deemed to be of general interest to a
forensic practitioner. However, this would vary on a case
by case basis. It should also be noted that the DIR supports
different logging (verbosity) levels which are noted in the
DIR configuration file. In our experiments, we left this
value at the default which does not result in particularly
verbose logging. More information of interest may, however, be logged if the administrator had raised the level of
logging before the commencement of the forensic
investigation.
Metadata and replica catalog (MRC)
The MRC stores a broad range of metadata relating to
the volumes in the XtreemFS instance and the files/directories stored within these volumes. This includes high
level data such as the number of OSDs (locations) a file is
striped/replicated across and the OSD identifier (UUID)
down to file level metadata critical for practitioners such as
filename, size, temporal information (access, creation,
modification times) and permissions/ownership information. A practitioner can use this information (combined
300 B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313with information gained from the DIR service) to not only
locate components of files of interest but also potentially
gain a better understanding of the provenance of the files in
the XtreemFS instance.
Three artefact types of potential value to a forensic
investigation exist on the MRC server:
Volatile and non-volatile construct metadata e While
the DIR service provides metadata for the service components that comprise an XtreemFS instance, the MRC
stores metadata which defines internal constructs to the
XtreemFS filesystem (e.g. volume metadata).
Volatile and non-volatile file metadata e This includes
(1) high level metadata such as the number of OSDs
(locations) a file is striped/replicated across and the OSD
identifiers (e.g. UUID), (2) down to low level file metadata critical for practitioners such as filename, size,
temporal information (access, creation, modification
times) and permissions/ownership information.
Configuration files e Configuration information of interest for understanding the environment and MRC
operations may include network information (addresses/ports used, etc.), authentication information
(systems, passwords, secrets, etc.) and operational information (temporal information, structured metadata
formats, etc.).
Evidence source identification and preservation
Using the data gathered from the DIR service a practitioner should be able to identify the logical (network)
location of the MRC (e.g. via datacentre switch/routing
examination). Once the practitioner has used standard
forensics methods to gain access to the server hosting the
MRC service (either remotely or locally) they should
commence identification by reviewing/confirming the
configuration of the service using the configuration files.
The MRC service has a plaintext configuration file located
(by default) at “/etc/xos/xtreemfs/mrcconfig.properties”.
This configuration file has many similar directives to the
DIR configuration file which we have reviewed (e.g. port
numbers for the service and its HTTP monitoring facility,
SSL configuration, service UUID). However, there are also
MRC configuration directives which may be of interest. It
should be noted that much of the information available
in the configuration file is also available in the configuration section of the DIR HTTP status page (excluding any
passwords or secrets). The MRC has a configuration
directive for defining the DIR service hostname/IP
address (dir_service.host) and port (dir_service.port)
along with directives for defining replica DIR service
hostname/IP and ports (e.g. dir_service2.host). This may
be of interest for a practitioner (if this information has
not already been collected) to determine the entities
which are hosting and have access to the XtreemFS
instance. A number of file temporal metadata configuration directives are set at the MRC level, these include
whether file access times are updated (no_atime) and the
method/configurations for determining the current time
for file temporal information across the distributed
filesystem.
The MRC HTTP status page (by default available on port
30636) lists a number of configuration directives which
may be of use to a forensic practitioner. As with the DIR
status page, the MRC status page can be accessed by default
without authentication. However, if an administrator
password is enabled in the MRC configuration file, the
administrator password will be required to access the status page. The page lists configuration information (e.g. the
network location of the directory service), various load and
request statistics and volume information. The volume information is of particular interest to a practitioner if they
are searching for a particular volume or are seeking data on
volume sizes/configurations to determine if physical
acquisition will be feasible. This information allows the
practitioner to determine the type of volumes hosted by
the MRC in terms of striping, access and replication policies.
It also provides information on selectable OSDs (to determine which OSDs store data for the volume) and the
number of files and occupied disk space for the volume.
This information can be useful for the identification stage of
the forensic process when a practitioner is making decisions on the best method of evidence collection for the
XtreemFS environment.
The MRC is also responsible for providing authentication services for the virtual filesystem. At the time of
research, XtreemFS listed support in its installation guide
for two types of authentication. “org.xtreemfs.common.auth.NullAuthProvider” (which is the
default) relies upon the user information provided from the
local operating system by the client. Depending on the
environment in which XtreemFS is being used, this could
be a security issue. For example, a user which has root
privileges on any PC which has network access to the
XtreemFS instance could potentially access all files in an
XtreemFS volume. From a digital forensic perspective, this
is a mixed blessing. If a practitioner has access to the
XtreemFS system via the network, then it would be
straightforward to connect as a root user and collect evidence. However, this introduces issues of provenance as
other unknown individuals may have had the potential to
use the same method to access/modify files without the
file/volume “owners” knowledge. The other authentication
provider included in XtreemFS is “org.xtreemfs.common.auth.SimpleX509AuthProvider” which
makes use of X.509 certificates for client authentication.
While this is a much more secure solution, it is not enabled
by default. When enabled SSL is used for authentication
between peer services and between the clients and services, it also (optionally but by default) encrypts all traffic
as it traverses the XtreemFS instance. SSL encryption and
the use of X.509 certificates are discussed further in the
client analysis section (XtreemFS Client section). However,
it should be noted that unlike the default NullAuth
configuration if X.509 authentication is enabled practitioners will not be able to connect to collect evidence using
the XtreemFS client without the relevant user key, a
“superuser” key or a host/service key (which reverts the
authentication to trusting the local client for user credentials) e standard POSIX access denied errors are received if
unauthorised access is attempted.
B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313 301Collection
Collection of MRC data is critical not only as the metadata may provide a form of evidence but also for the
reconstruction of the XtreemFS environment if it is not
currently functioning or fully accessible. It is also critical if a
practitioner seeks to reconstruct files from physical extractions of the relevant OSD components. Once the practitioner has access to the MRC server there are two
methods of collecting the MRC database.
As discussed in Collection & examination and analysis
section the database binary logs can be collected and
replayed using the BabuDB APIs to form the last state of the
database. This may be the only feasible method of collection if the MRC component of the XtreemFS instance has
been shutdown or the practitioner would like to use standard forensic methods to attempt to collect previous versions of the MRC to determine if files existed in previous
versions of the environment metadata. However, in our
experiments, we found this method more difficult to
execute and we had mixed results in collection of usable
data using this method (e.g. some reconstructions of the
database resulted in corrupt output). Similarly to the DIR
component the MRC database files are located by default in
“/var/lib/xtreemfs/mrc/” where there are two subdirectories named “database” and “db-log”. The database
collection and examination techniques discussed in
Collection & examination and analysis section can be
similarly applied if the practitioner needs to use this
method of collection.
Unlike the DIR component, the MRC component includes
a much simpler method for collecting the current state of
the MRC database. This collection is achieved by using the
“xtfs_mrcdbtool” which is part of the “xtreemfs-tools”
package. The latter would be likelyinstalled on the MRC host
but can also be used from another host on the network
which has access tothe MRC or could potentially be installed
on and run from removable storage on the MRC.
“xtfs_mrcdbtool” allows a user to dump and restore the MRC
database to/from a human readable XML format. Assuming
that SSL/certificate based authentication and/or an administrator password has not been enabled, the syntax for the
tool is as follows: “xtfs_mrcdbtool -mrc pbrpc://
localhost:32636 dump /tmp/mrcdump.xml”. This
command connects to the MRC on localhost and dumps the
MRC database to a file in “/tmp/” named “mrcdump.xml”.
The practitioner can then open this file in a text editor to
determine that the dump has been successful and copy the
file to external media as a method of collection.
Similarly to the DIR component (see Collection &
examination and analysis section), the MRC stores an operations log at “/var/log/xtreemfs/mrc.log” which should be
collected and analysed. Also similarly to the DIR log, little
information of general forensic interest was found in this
log from our experiments, although the logging level
selected may affect this result.
Examination and analysis
Assuming that the practitioner has collected the MRC
database using the “xtfs_mrcdbtool” the resulting XML file
can be partially analysed using any text editor. If the
practitioner has collected the database files manually the
examination and analysis processes will be similar to those
discussed in Collection & examination and analysis section,
for the remainder of this section we will assume that the
practitioner has dumped the MRC database to an XML file
and is seeking to examine and analyse its contents.
Although most of the XML file is plaintext, some attributes
are encoded using BASE64 which makes the values unreadable until they are decoded. We wrote an application to
parse the XML file, display its content and decode any
BASE64 encoded attributes.
In our experiments we found the broad (non comprehensive) structure of the MRC XML file to be as follows,
elements are listed as dot points and their attributes are
indented:
The root element was “FILESYSTEM”
B “DBVERSION” e This is a numerical identifier of the
database version.
The next element was “VOLUME”
B “ID” e The UUID for the volume generated by the
XtreemFS system.
B “NAME” e The volume name entered by the user
which created the volume.
B “ACPOLICY” e The numerical identifier for the
“Authorization Policy” (or access control policy) used
on the volume. These numerical identifiers and their
meanings are defined in the “XtreemFS Installation
and User Guide”.
The next element was “DIR”, this represents directories
in the volume but the first entry represents the root
directory of the volume.
B “ID” e The file ID assigned by XtreemFS for the
directory. The file ID is an integer. Generally, the root
directory for the volume is “1” and the integer is
incremented with the addition of files/directories.
B “NAME” e The name assigned by the user for the
directory, the volume name for the root directory.
B “UID” and “GID” e The user and group respectively
which own the directory, this is represented in the
XML dump as a username (e.g. “root”). Depending on
the authentication service used (see Evidence source
identification and preservation section), this information may not be reliable.
B “ATIME”, “CTIME”, “MTIME” e Accessed, created and
modified times respectively, these are represented as
POSIX timestamps in UTC. Notably during our experiments, the CTIME and MTIME values were updated
when the contents of the folder were accessed or
changed. Timestamps should be accurate between
XtreemFS nodes as server clocks are synced with the
DIR as part of the systems operation. The practitioner
should check the DIR server (and other nodes for
completeness) for any offset between real time and
system time.
B “RIGHTS” e The numerical representation of the
POSIX permissions applied to the directory. The numerical representation is somewhat different from
the standard representation used for POSIX permissions. Owner, group and others are the three classes
302 B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313of permissions with read, write and execute being
the three types. For the “others” class 1, 2 and 4
represent execute, write and read respectively, for
the “group” class 8, 16 and 32 represent execute,
write and read respectively and for the “owner” class
64, 128 and 256 represent execute, write and read
respectively. These numbers are summed depending
on the permissions applied to the directory and
stored as a single number. For example if the owner
has read and write permissions, the group has read
and others have execute (traverse for directories) the
permission stored would be 417 (i.e.
128 þ 256 þ 32 þ 1 ¼ 417). Table 1 provides a visual
summary of this representation.
B “W32ATTRS” e This was reported as the numerical
value of “0” in our experiments, presumably as we did
not use the Windows version of XtreemFS.
The next element was “ATTRS” which encloses the
“ATTR” element(s) that contain attributes applicable to
the DIR. These “ATTR”s were only noted in relation to
the volumes root directory and not other directories or
files in our experiments. The “ATTRS” element did not
have attributes in our experiments, and the “ATTR” elements had four elements “KEY”, “VALUE”, “ENC” and
“OWNER”. All “VALUE”s were encoded in Base64 (as
noted in the “ENC” attribute), and these values were
decoded by our application and are represented in
plaintext below. The “KEY”s are listed and discussed
below.
B “allowSnaps” e This true or false value represents
whether snapshots have been enabled on the volume.
B “osdPol” e These comma separated numerical values
represent the OSD policies applicable for this volume.
These values include policies for filtering, grouping
and sorting OSDs for the volume. This may be of interest to a practitioner if they wanted to determine
why particular OSDs were used to store file stripes but
will not allow a practitioner to determine which OSDs
were selected at the time of file creation and as such
the current location of the file stripes.
B “sp” e A textual representation of the striping policy
which applies to the volume in a similar format to that
displayed in the “xtfsutil” client application. The
format of this string is described in the “XtreemFS
Installation and User Guide”, it denotes the type of
striping (only “RAID0” is supported at time of
research), the stripe size in KB and the number of
stripes to create (if that quantity of OSDs is available).
B “acPol” e This appears to replicate the “ACPOLICY”
attribute in the “VOLUME” element.
B “volId” e This appears to replicate the “ID” attribute in
the “VOLUME” element.
B “replPol” e This appears to store the replication policy
for the volume, as we did not comprehensively study
replication (and it was disabled for this experiment)
this value was blank
B “scrubber.latestscrub” e The POSIX timestamp representing the last time a scrub operation (filesystem
clean-up) was run on the volume. Notably this element
includes an extra element of “OWNER” which lists the
username of the user which started the scrubbing
operation.
The next element was “FILE” in our experiments; this
represents the metadata for the first file we added to the
directory. However, it could have also been a “DIR”
element. The XML representation mimics the hierarchical nature of the filesystem.
B “ID” e The file ID (an integer) assigned by XtreemFS for
the file, this is a critical identifier if the practitioner is
seeking to manually locate/reconstruct files. Files in
the XtreemFS system are addressed by the volume
UUID and their file ID.
B “NAME” e The name assigned by the user for the file,
the “filename”.
B “SIZE” e The size of the file in bytes.
B “EPOCH” e A numerical representation, set to 0 and 1
in our experiments.
B “ISSUEDEPOCH” e A numerical representation, set to
0 and 1 in our experiments.
B “UID” and “GID” e The user and group respectively
which own the directory, this is represented in the
XML dump as a username (e.g. “root”). Depending on
the authentication service used (see Evidence source
identification and preservation section) this information may not be reliable.
B “ATIME”, “CTIME”, “MTIME” e Access, created and
modified times respectively, these are represented as
POSIX timestamps in UTC. Timestamps should be accurate between XtreemFS nodes as server clocks are
synced with the DIR as part of the systems operation.
The practitioner should check the DIR server (and
other nodes for completeness) for any offset between
real time and system time.
B “RIGHTS” e The numerical representation of the POSIX
permissions applied to the file. As discussed above, the
numerical representation used to store permissions in
the XtreemFS database is somewhat different from the
standard representation used forPOSIX permissions. File
rights representations also differ somewhat from directory rights. While the same numbers are used to represent the permission types for the three classes, file
permissions are subtracted from the 16-bit integer limit
(32,768). As such file permissions are summed depending on the permissions applied to the file. This number is
Table 1
Summary of the XtreemFS directory RIGHTS scheme.
Class Others Group Owner XtreemFS directory
Type E W R E W R E W R RIGHTS
XtreemFS permission value 1 2 4 8 16 32 64 128 256
XtreemFS value total 1 32 128 þ 256 ¼ 384 417
Note: E, W and R denote Execute, Write and Read respectively.
B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313 303then subtracted from the 16-bit integer limit and stored
asasinglenumber.Forexampleifthefileownerhasread
and write permissions, the group has read and others
haveexecutethepermissionstoredwouldbe32,351(i.e.
32,768 417 ¼32,351,where417 ¼128 þ256 þ32 þ1).
B “W32ATTRS” e This was reported as the numerical
value of “0” in our experiments, presumably as we did
not use the Windows version of XtreemFS.
B “READONLY” e A true or false value to note if the file is
read-only after creation in the XtreemFS system.
Thenextelementwas “XLOCLIST”inourexperiments,and
thisbegins the metadata onthe locationof filestripes and
replicas. “XLOCLIST” notes a version number and the
replication policy (e.g. “WqRq”) that determines how
replicas are created and if replicas are read only or read/
write. This attribute is blank if replication is not enabled.
The next element is “XLOC” that contains striping and
replication attributes for the file.
B “PATTERN” e a string representation of the striping
policy for the file, and this may differ from the volume
striping policy as the striping policy can be changed at
the file/directory level.
The next element is “OSD” that contains OSD location
for the file.
B “LOCATION” e The UUID of the OSD that stores a stripe/
replica of the file. This information is critical to a practitioner looking to manually reconstruct files in the
XtreemFS instance as it will indicate which OSDs need
to be accessed to collect file stripes for reconstruction.
The summary of attributes of common forensic interest
to a practitioner is outlined in Table 2.
Object storage device (OSD)
The OSDs will be the core component of forensic interest as the OSD stores the data stripes which allow the
practitioner to reconstruct the files. The OSD also has the
potential for a practitioner to recover deleted file parts
using existing forensic techniques on the underlying filesystem. The practitioner's identification and analysis of
previous components (DIR and MRC) will provide the
practitioner with the information they need to determine
which OSDs in the network contain the data they are
seeking to collect and allow the practitioner to identify the
relevant individual files on the OSDs.
Two artefact types of potential value to a forensic
investigation exist on the OSD server(s):
Non-volatile file data e The OSD is responsible for the
content of files stored in the XtreemFS filesystem which
would form the majority of non-volatile data of interest
on the OSD(s). Other data that should be collected is any
local metadata (e.g. local filesystem permissions) and
log data.
Configuration files e Configuration information of interest for understanding the environment and OSD operations may include network information (addresses/
ports used, etc.), authentication information (systems,
passwords, secrets, etc.) and operational information
(physical data storage locations, data structure formats,
etc.).
Evidence source identification and preservation
The OSD is the most likely component in the XtreemFS
architecture to be installed numerous times across a
number of disparate physical devices. Regardless of
whether the administrator is seeking to use XtreemFS
striping or replication functions (or a combination of both),
multiple OSDs are fundamental to the effective operation
(e.g. replication for higher IOPS e Input/Output Operations
Per Second) of the XtreemFS system. Identification and
preservation of these components is particularly critical for
a forensic practitioner as OSDs store the actual data
uploaded by the clients.
The DIR component that the practitioner will have
analysed as part of an earlier phase provides the practitioner with the network location of the OSDs, which should
be useful in locating the physical OSD host devices for
Table 2
Summary of attributes of forensic interest.
Element Attribute/key Rationale
VOLUME ID The volume ID (UUID) is necessary to
manually reconstruct files stored on the
OSDs.
NAME The volume name is potentially useful
to determine the owner of the volume
and/or its contents.
DIR ID The directory ID and name can be useful
in reporting. They may also assist in
determining provenance.
NAME
UID & GID The UID and GID owners and their
associated rights for the directory may
be useful in determining who had
access to a directory and data
provenance.
RIGHTS
CTIME
ATIME
MTIME
The created, accessed and modified
times may be useful in determining
when the directories contents were
accessed or modified.
ATTRS
(volume
root)
allowSnaps Used to determine if snapshots are
enabled on the volume.
sp Used to confirm the type of striping
used, size of individual stripes and
potential number of stripes created for
files in this volume.
FILE ID The file ID and name can be useful in
reporting. They may also assist in
determining provenance.
NAME
UID & GID The UID and GID owners and their
associated rights for the file may be
useful in determining who created/had
access to a file.
RIGHTS
CTIME
ATIME
MTIME
The created, accessed and modified
times may be useful in determining
when files were last added, accessed
(when enabled) and changed.
XLOC PATTERN The specific stripe pattern used for an
individual file. It should be noted that
this may be different to the volume
striping pattern.
OSD LOCATION The UUID of an OSD which stores the
stripe(s) of the file. There is generally
more than one OSD entry for a file with
striping (or replication) enabled.
304 B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313preservation and collection. As with the DIR and MRC
components, the OSD component has configuration directives that define which ports are used for communication, how the DIR is located and a basic status page. The
OSD configuration file is located (by default) at “/etc/xos/
xtreemfs/osdconfig.properties”. The configuration file
contains similar directives to the other two configuration
files discussed (such as the listen/http ports, DIR host/port
and SSL configuration). However, it also contains some
important OSD specific directives.
Perhaps the most important directive from the
perspective of a forensic practitioner is the “object_dir”
directive, which specifies the filesystem location of the OSD
storage. This directive (by default) is set to “/var/lib/
xtreemfs/objs/”. However, it could and, presumably, would
often be changed to suit the individual implementations.
The forensic practitioner can use this directive to start the
search for the physical system/media that holds the OSD
data. For example, the data could be stored on a local disk
(or disk array) in the server or it could be hosted on a NAS
(Network Attached Storage) device or SAN (Storage Area
Network). In the latter case, the “object_dir” directive
would point to a mount point which the operating system
provides to the remote storage. The use of this mount point
(either directly or using the XtreemFS system) could be the
most feasible method for a practitioner to collect a logical
copy of data stored by the XtreemFS environment as access
to the remote storage device may be complicated by
authentication or other technical requirements.
The OSD HTTP status interface provides runtime statistics (which only require authentication when enabled in
the configuration file as with the DIR and MRC) on the OSD
mostly in terms of usage. However, these statistics are of
limited interest to a forensic investigation. The availability
of this interface (and port), however, could be a method of
network based identification (as with the other XtreemFS
components).
Collection
The collection procedure for the OSD will vary
depending on the requirements of the practitioner. As OSDs
generally store, stripes of files a practitioner may choose to
use the XtreemFS client tools (discussed in XtreemFS Client
section) to mount the XtreemFS filesystem and collect the
relevant files. However, if for any reason (e.g. lack of
network access, authentication issues, the need for a more
thorough physical collection of file stripes) the practitioner
chooses not to use this method, they will need to manually
locate and reconstruct the files of relevance. Using the information collected from the MRC (and DIR), the practitioner can locate and collect only the relevant files' parts,
rather than needing to collect an entire XtreemFS installation (potentially consisting of many OSDs). In a medium
to large XtreemFS environment, this could be the only
feasible option for a forensic practitioner as physical
collection of the entire XtreemFS environment (which may
consist of thousands of devices) would not be feasible.
Once a practitioner has examined the metadata
collected from the MRC and identified the files/volumes
they need to collect, they can identify the OSD(s) that store
the file stripes by UUID. This UUID can be used with the
information collected from the DIR to locate the OSD on the
network and, in turn, locate the device physically (or where
appropriate access it remotely). Using the “object_dir” and
mount point information available (as discussed above), a
practitioner can determine where the data is being physically stored and make a decision as to whether they will
collect the file stripes logically (e.g. by gathering the
available filesystem level metadata and then copying the
relevant stripes from the mount point on the OSD to
external media) or by collecting a physical image of the
storage device or relevant parts of the device. If a practitioner seeks to collect only individual file stripes (logically
or physically), they will need to understand the hierarchy of
the “object_dir” which is discussed in detail below.
Similarly to the DIR and MRC components (see
Collection & examination and analysis and Examination
and analysis sections), the OSD component stores an operations log at “/var/log/xtreemfs/osd.log” which should be
collected and analysed. As we found that the default logging settings did not provide significant detail of interest
for forensic purposes, we changed the logging level
(verbosity) on one of the OSDs in our experiment environment to the highest level, 7 e “debug”. With this level of
logging enabled, we found that much more relevant information was logged and, as such, would be available as
part of a forensic investigation. The usefulness of the information presented would still be case specific. However,
as this level of logging records data to the granularity of
individual file stripes, this has the potential to be significantly more useful than standard logging. We note that it is,
of course, not possible to change the logging level after an
event has occurred which requires forensic investigation.
Therefore, when implementing XtreemFS we advise users
to select an appropriate level of logging (preferably
verbose) to ensure that an evidence trail can be followed at
a later date.
Examination and analysis
The “object_dir” directory contains a number of subdirectories and metadata files which relate to the individual
stripes stored by the OSD. A simple directory listing on an
OSD that is currently storing data will reveal a number of
directories which have a name consisting of two hexadecimal characters (e.g. 7E or 22). In our experiments, we
found that a further three two-hexadecimal character directories are located under the initial directory (e.g. a path
of [obj_dir]/AA/BB/CC/DD/). In the last directory, a further
directory exists with a name composed of the UUID of the
volume followed by a colon followed by the file ID number
(see Examination and analysis section) on that volume (e.g.
9be62ga4-21q1-1763-rq15-kap1021p4aq9:7). In this
directory are the file stripes stored by the OSD. Configuration of the striping metadata on the volume or individual
file at the MRC will determine the size and quantity of
stripes stored in this directory. This can mean that all of the
necessary stripes are stored within the selected directory or
only a subset of the stripes may be available on this
particular OSD.
This implies that individual file reconstruction is
possible without relying upon the live XtreemFS environment to extract the data. However, in any reasonably sized
B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313 305implementation of the environment, it would not be
feasible for a practitioner to locate and reconstruct files of
interest without the metadata collected as discussed. Many
forensic searches are based on keywords and known file
hashes, and these methods may not be useful when files are
split into an arbitrary number of parts. In cases where cloud
computing is involved, a practitioner may initially seek to
collect all evidence from a particular user. However, it is
clear that a practitioner would not be able to easily determine which files or stripes are owned by a particular user
from the OSDs hexadecimal representation. To collect evidence at the low level that is commonly used with traditional devices (bitstream images), a practitioner must
understand the technical method used by the storage
platform to allocate file components.
In the case of XtreemFS, there are two key encodings
that a forensic practitioner should understand in order to
reconstruct files from a physical image of the OSD(s) e the
directory path for a particular file or files (determined in
part from analysis of the MRC data) and the stripe
filenames.
The directory path that is created on the OSD(s) to store
the file stripes is determined based upon the volume UUID
and file ID. This file structure format is referred to as
“HashStorageLayout” in the XtreemFS configuration directives. Based on our analysis and review of the XtreemFS
source code, we were able to create an application that
accepts a volume UUID and file ID (e.g. 9be62ga4-21q1-
1763-rq15-kap1021p4aq9:7) and returns the path which
stores the stripes on the OSD (e.g. /var/lib/xtreemfs/objs/
D0/BE/B6/53/9be62ga4-21q1-1763-rq15-kap1021p4aq9:7/
). This path is derived from the Java String “hashCode”
function which then requires a number of other transformations before resulting in four pairs of hexadecimal
characters (D0BEB653 in the above example) that are then
used to create the directory path.
The stripe filenames consists of three sixteenhexadecimal character bocks concatenated together to
form a 48-character filename (e.g. 0000000000000000100
00000000000000000000000000000). Reviewing the
source code, the HashStorageLayout createFileName function provides the following hints, the first block is referred
to as “objNo”, the second as “objVersion” and the third as
“checksum”. objNo appears to refer to the stripe sequence
number, and in our experiments, we found the files must be
reconstructed in the correct order based upon these object
(stripe) sequence numbers. The decimal sequence number
is encoded using a hexadecimal representation of a “long”
type number (i.e. an unsigned integer in base 16) and
reversed before being buffered to 16 characters by
appending “0”'s. For example, while stripe two is represented as “1000000000000000” (zero represents the first
stripe as expected), stripe eleven is represented as
“A000000000000000”, stripe sixteen is represented as
“F000000000000000” and stripe seventeen is represented
as “0100000000000000” (note the reversed ordering). In
our experiments, we were able to successfully manually
reconstruct a number of small and medium sized files
proving the concept that manual reconstruction is possible
if the practitioner has sufficient knowledge of the operation
of the environment.
The last sequence of sixteen hexadecimal characters
represents the checksum of the object (individual file
stripe) when enabled. Checksums are not enabled by
default and are enabled on a per OSD basis. Considering the
importance of file integrity to forensic investigations, we
elected to enable checksums on one of our OSDs to gain a
better understanding of their operation in the XtreemFS
environment. We tested both the Adler32 (default) and
CRC32 checksums. While the OSD configuration file indicates that MD5 and SHA-1 are also supported, with our
experiment environment configuration, we found that
when these checksums were enabled we were unable to
assign objects to these OSDs. As such we were unable to
test the MD5 and SHA-1 checksums. Similarly to the other
parts of the object identifier, the checksum is represented
in hexadecimal and reversed. Other than this representation format difference, in our experiments the checksums
for the objects (calculated using other tools) matched the
checksums recorded in the object IDs. As such a practitioner could potentially make use of this functionality to
ensure that logical copies of the objects are error free or to
determine if objects have been modified after they have
been stored (although presumably if the file data can be
modified by an attacker the checksum can also be
modified).
XtreemFS client
The XtreemFS client applications can provide significant utility to a forensic practitioner seeking to extract
evidence from the distributed filesystem environment. It
must be acknowledged that in a number of situations,
the manual file reconstruction methods discussed above
would be necessary. However, it may also be possible for
a practitioner to use the XtreemFS client tools in a more
automated fashion to collect the files from the environment. The various types of metadata collected can
then be used, combined with the files collected, to
create a relatively comprehensive (logical) representation of the data stored by a particular user in the
environment.
If a practitioner has access to the mounted filesystems
on a client either live (using commands such as mount) or
based upon logged data or memory captures, this information can be used to simplify the location of the storage
nodes in the environment (e.g. by identifying the DIR
instance in an XtreemFS environment). In the standard
mounted filesystem listing on a Linux system the following
basic structure is used:
[mounted device] on [mountpoint] type [type]
([options])
e.g. /dev/sda1 on /mnt/hdd type ext4 (rw, nosuid,
noatime)
XtreemFS mount-points appeared as follows in our
experiments:
[email protected]/volumeName on /mnt/volumeName type fuse (rw, nosuid, nodev, noatime)
306 B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313An example mount command for an XtreemFS volume
(named “volumeName” with a DIR hostname of “dir.local”
to a local mountpoint of “/mnt/volumeName”) is as follows:
mount.xtreemfs dir.local/volumeName /mnt/
volumeName
A number of artefacts of interest to a forensic practitioner can be determined from this listing. Firstly, the string
“xtreemfs@” as part of the mounted device indicates that
this is an XtreemFS connection. If a practitioner is searching
a node for a connection to XtreemFS, this is one obvious
method of verifying the connection. Secondly, “dir.local” is
the hostname for the DIR service in the XtreemFS environment. This allows a practitioner to (logically) locate the
DIR as required at the commencement of the forensic
process. After the DIR hostname the volume name on the
XtreemFS instance is provided, this potentially allows a
practitioner to narrow their focus to the volume in
question.
If a practitioner wishes to mount the volume using a
locally connected machine (e.g. an acquisition laptop), they
will need to use the “mount.xtreemfs” tool. If SSL and X509
permissions are not enabled, this can be as simple as
“mount.xtreemfs [DIR hostname]/[volume name] [/local
mount-point]”. However when SSL and X509 are enabled,
the practitioner will need to have access to a certificate (as
discussed in Evidence source identification and
preservation section) and its passphrase as they are used
with both the “–pkcs12-file-path” (pointing to the certificate file) and “–pkcs12-passphrase” (providing the passphrase) parameters with the mount command.
If a practitioner has access to a client with a mounted
XtreemFS volume, they can use the “xtfsutil” utility to
interrogate the volume for XtreemFS specific information.
xtfsutil is a powerful tool for both gathering information
from and managing XtreemFS volumes. In its most basic
form, the command is run with only one parameter: the
path to a mount-point or a file or directory under that
mount-point. When used in this manner, xtfsutil provides a
range of information including items of interest for three
types of objects, a volume mount-point, a directory and a
file (see Table 3):
Summary
The section summarises the process a forensic practitioner may follow to conduct a successful forensic investigation of an XtreemFS instance. It should be read in
conjunction with (and is adapted from) Collecting evidence
from a distributed filesystem e a process section, which
provides a generic process for evidence collection from
distributed filesystems. Collecting evidence from the DIR,
MRC and OSD components will require varying levels of
access depending on the type of data and acquisition
method the practitioner selects. Volatile data such as
component status pages will require access to a web
browser on the host or access to the (V)LAN on which the
status pages are hosted and may require an administrator
password (if enabled). Access to non-volatile data such as
databases and object stripes may require root or administrator access on the XtreemFS hosts (depending on
configuration) if a practitioner is seeking to collect them
while the OS is running. However, if a practitioner cannot
Table 3
xtfsutil Key attributes.
Attribute name Applicable
type(s)
Description
XtreemFS file Id All One of the most important identifiers (discussed further in Examination and analysis section) for locating
files on OSDs. A unique file/directory identifier consisting of the volume UUID and file/directory number.
“XtreemFS URL” Volume
Directory
The URL used to connect to the volume, including the protocol, hostname/IP address of the DIR, port number
and volume name in the following format: [protocol]://[hostname]:[port number.]/[volume name]/
[directory name where appropriate]. e.g. pbrpcs://DIR: 32638/xtfsvolume/dirname
Owner All The name of the POSIX user or certificate (depending on authentication type enabled) which owns the item.
Group All The name of the POSIX group which owns the item (when X509 is being used, this is derived from the OU
listed in the certificate).
Type All Textual representation of the object type, e.g. volume, directory and file.
Free/Used Space Volume Free space and used space on the volume delimited by a slash.
Num. Files/Dirs Volume Number of files and directories on the volume delimited by a slash.
Access Control p. Volume The access control policy applied to the volume (e.g. POSIX).
OSD Selection p. Volume Numerical representation of the OSD selection policy.
Replica Selection p. Volume Textual representation of the OSD selection policy (“default” in our experiments).
Default Striping p. Volume
Directory
Textual representation of the volumes default striping policy (as discussed in Examination and analysis
section).
Default Repl. p. Volume
Directory
Textual representation of the volumes default replication policy (“not set” or “none (not replicated)” in our
experiments).
Snapshots enabled Volume
Directory
A textual (“yes” or “no”) representation of whether snapshots are enabled on this volume.
Selectable OSDs Volume A list of UUID, IP address and port numbers for the currently selectable OSDs for this volume.
Replicas File The number of replicas (e.g. Replica 1, Replica 2, …, and Replica N) with a number of sub values. These
include the “Striping policy” which is a textual representation of the striping policy applicable to this
individual file (that may be different from the default striping policy used on the directory or volume). The
OSD(s) storing objects for each of the replicas is also listed (e.g. OSD 1, OSD 2, …, and OSD N) with the UUID,
IP address and port noted.
B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313 307gain access to an administrative account on the host, then
the techniques discussed in this paper should allow them
to manually reconstruct files of interest.
It must be noted, however, that this is a somewhat time
consuming practice and, as such, practitioners should take
every opportunity to collect the majority of files from the
distributed filesystem using a client connected to its
mounted volumes.
1. Identify XtreemFS usage e The initial steps a practitioner will undertake when investigating an XtreemFS
instance will depend on the means by which they
identified the use of XtreemFS. Most commonly we
expect this to be via their locating an XtreemFS client or
being informed of the use of XtreemFS by a system
administrator. Regardless of identification source, the
practitioner's aim is to collect any logical data they have
access to and locate (physically and/or logically) the DIR.
In the case of an XtreemFS client (with a currently
mounted remote filesystem), we highlight the method
by which the environment's directory service (DIR) can
be located in XtreemFS Client section. We also recommend that a practitioner take a logical copy of the data
that they are able to access via the currently mounted
volumes on the client. They should also check the client
mount logs, Linux ‘fstab’, etc. (where available) and
attempt to mount (read only) any volumes found (see
the mount example in XtreemFS Client section).
Generally to access this information (and potentially to
access data on the mounted filesystems), they will
require root or administrator access to the client or they
could power off the client and conduct a standard
forensic analysis on the client device that may reveal
persistent data on mounted filesystems.
If the practitioner is satisfied with the logical data
which they have collected from the client and do not
wish to pursue further collection opportunities (which
require physical access to the XtreemFS hosts or
cooperation by the system administrators), they may
choose to end the process at this point. However, if a
practitioner is looking to collect the maximum potential evidential data possible, they will need to proceed
with the next step of the process.
If a practitioner is commencing their investigation by
discussing the environment with a system administrator
(who mentions the existence of XtreemFS), they should
ask where the physical environment hosts (e.g. servers
and storage systems) are located and should also request
any authentication (e.g. X.509) certificates/passphrases,
etc. and, if possible, to be setup on a client connected to
the XtreemFS environment with a “superuser” key (as
discussed in Evidence source identification and
preservation and XtreemFS Client sections). A practitioner can then mount the volumes (as read only) using
the XtreemFS client and collect logical data from the
volumes (and can use the DIR status page to ensure they
are connected to all hosted volumes).
Another potential identification opportunity we envision is via network broadcast interception (searching
for the network artefacts, such as “autodiscover”
requests, discussed in Evidence source identification
and preservation section) or sending broadcasts
(using the “autodiscover” method). However, for many
cases, this seems less feasible as an initial identification
method. This can be used to logically locate (preferably)
the DIR or other XtreemFS components that can then
be used to locate the DIR.
2. Collect and Examine DIR Information e Once the DIR
has been logically or physically located, the practitioner
should collect volatile data from the DIR status page
(from a web browser on the host or via its (V)LAN). They
should also obtain root access to the host (via a remote
or local shell) and collect the non-volatile artefacts
outlined in Collection & examination and analysis section. If they have physical access, they could also collect
the virtual machine disk or physical disks hosting the
DIR to obtain the non-volatile artefacts.
The collected artefacts from the DIR server should be
examined to determine the logical and/or physical
location of the MRC and OSD hosts in the XtreemFS
environment. This will allow the practitioner to collect
and analyse the filesystem metadata and select files of
interest to ensure their stripes are collected from the
relevant OSDs.
3. Collect and Examine MRC Information e Collecting
the MRC metadata database is the next step after the
components have been identified and located using the
DIR metadata. Physically locating the MRC host should
be possible using existing techniques (e.g. datacentre
switch/routing examination) once the MRC IP address
has been determined. Collecting the database files
directly will generally require root access to the MRC
host (either console or remotely). It may be possible to
use the “xtfs_mrcdbtool” from a standard user account
or from another PC on the XtreemFS (V)LAN. However, it
will not be possible if authentication is enabled unless
the practitioner possesses the keys and/or admin passphrase. If the practitioner has root access to the host
then the database constituent files can be obtained (as
described in Collection section). However, once root
access has been obtained, the practitioner may be able
to collect the required authentication keys/passphrase
(as described in the MRC configuration file) to use the
“xtfs_mrcdbtool” which will simplify the collection.
Once the metadata database has been obtained, it
should be analysed to determine which volumes/directories/files are of interest and have not yet been
collected (via the client) or merit further investigation.
The artefacts that are available to make this determination are discussed in Examination and analysis section and summarised in Table 1. The OSD GUIDs should
be noted and cross-referenced with the logical OSD
location (IP address) data collected from the DIR to
determine which OSDs need to be collected.
4. Collect, Examine and Analyse OSD Information e
Using the OSD identification data obtained from the DIR
(logical location) and MRC (stripes of interest), a practitioner can begin collection of the relevant OSD data.
The primary items for collection for an OSD are the
constituent stripes that form individual files on the
308 B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313filesystem. These stripes can be logically collected (e.g.
copied) from the running host operating system's underlying filesystem (e.g. ext4) if the practitioner has
local or remote root access to the OSD host. However, if a
practitioner is attempting to conduct a more in-depth
analysis, they may wish to collect forensic images of
the physical disks that support the hosts underlying
filesystem. This process would obviously require physical access. However, it does not rely upon the practitioner having access to any operating system level
credentials.
The process for analysing and ultimately reconstructing
the filesystem stripes into files is discussed in
Examination and analysis section. Checksums are also
discussed in this section, and if enabled, provide a
useful method for practitioners to determine that their
copying process for file stripes is maintaining file
integrity.
Fig. 1 provides a graphical abstract of the steps outlined above. Table 2 also provides a summary of the
types of stored metadata artefacts that are referenced
in Fig. 1.
Fig. 1. XtreemFS forensic process. Adapted from Fig. 2.
B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313 309Collecting evidence from a distributed filesystem e a
process
This research demonstrates that a process must be
followed to ensure the collection of data and metadata to
the furthest possible extent from a distributed filesystem
environment. If a practitioner followed existing practice
and attempted to acquire a bitstream image of the storage
devices (in this case the OSDs), it is clear that a large
amount of metadata (available at the MRC) would be
missed. Metadata stored by the DIR may also be integral as
part of evidence collection or environment reconstruction.
For example, the practitioner may experience difficulty in
locating all of the data storage devices in a medium to
large sized installation without referring to the central
directory source for the distributed filesystem (the DIR in
this case).
With these considerations in mind, we propose the
following process for collecting evidence from a distributed
filesystem. This process contains elements which fall
within the stages of the cloud forensics framework (Martini
and Choo, 2012) outlined in Cloud forensics framework
section. The process consists of the following three
elements:
1. Directory Services: Locate directory services, collect
data and examine the data to determine the filesystem
components in use and their logical locations.
2. Metadata Storage: Using directory services data locate
the metadata storage server, collect the metadata and
examine it to determine which files/directories/volumes/etc. may be of forensic interest.
3. Data Storage: Using the environment metadata
collected from the directory service and the selected
target information obtained from the metadata storage,
collect relevant data storage devices or logical datasets
(e.g. stripes, filesystems) and reconstruct using the
collected metadata.
Each of the process elements is discussed in detail
below (also see Fig. 2).
Directory services
The directory services component of a distributed filesystem maintains metadata on the nodes in the filesystem. This may be implemented in a number of ways,
for example, in the XtreemFS experiments discussed in
Findings section, directory services were centralised to one
(or a specific number of) DIR server(s). In other architectures, this could be a more distributed model where each
node stores directory information or even a peer-to-peer
model where limited directory information is stored with
each node.
Regardless of the technical implementation, a practitioner should seek to gather the environment's directory
metadata as the first step. This would allow them to gain a
better understanding of the technical operation and node
logical location (e.g. IP address) in the distributed filesystem. Identification of these components will most
likely be via assistance from a system administrator or via
analysis of a cloud node (e.g. VM host server) or filesystem
client. Once the directory information has been located, it
should be preserved and collected. The specific methods
required for these steps will depend on the technical
implementation of the distributed filesystem.
Analysis of the directory data should consist of decoding
(where necessary) and documenting all relevant details for
each of the components/nodes in the distributed filesystem. Items of interest include logical location (e.g. IP
address), type (e.g. metadata or data store), capacity,
unique identifiers, et cetera. This information will be useful
in reporting in terms of documentation, chain of custody
requirements and provenance.
Metadata storage
Using the information gathered from the directory
services the practitioner should now be able to identify
the sources of metadata and potentially have an idea of
the types of metadata available in the environment. This
metadata may be stored with the data as part of the data
storage component. However, it is likely in a distributed
system that some or all metadata will be stored separately in a centralised or decentralised database. Once the
directory information has been located it should be preserved and collected before being examined and
analysed.
Examination of metadata will likely involve reducing
the size of the identification and collection stages of the
data storage stage of the process. The exact method used
will depend on the case. However, a common reduction
method would be to focus on metadata that indicates a
data owner who is a suspect or related in some way to a
suspect. In the XtreemFS example, this may be possible via
the “owner” metadata which can indicate which POSIX
user or certificate holder has access to the volume or has
stored an individual file in the volume. In some cases this
information is not reliable (e.g. if the default authentication provider is used in XtreemFS) or is unavailable (e.g.
when permissions are stored in the application which the
distributed filesystem supports). In this case the practitioner may need to rely upon permissions data available at
other levels.
Other types of metadata that may be of interest include
temporal data for both reporting and confirming provenance. File hashes are also useful if available for similar
reasons and to allow the practitioner to search the filesystem environment for various files of known interest by
hash.
Data storage
Once the necessary data has been examined and analysed from both the directory services and metadata store,
the practitioner should have the required information to
locate and reconstruct the files distributed throughout the
filesystem. Similarly to the process for metadata location,
the practitioner should be able to use data gathered from
the directory service to identify the nodes which store data
in the environment. Using the information collected so far,
310 B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313as part of the process, the practitioner can now determine if
it would be advantageous to complete a physical collection
(i.e. collection of a bitstream image from the physical
storage devices) of the data storage nodes or attempt to use
the information gathered from systems administrators,
client nodes or compute nodes in a cloud environment to
connect to the filesystem and collect the evidential data
logically (i.e. collected using filesystem client applications).
In either event, the practitioner should ensure that they
follow forensic preservation techniques (write blocking,
etc.) as they would with a regular data collection. The
practitioner should also ensure they collect any additional
metadata which may be stored with the data objects (e.g.
local filesystem temporal and permissions data) and on the
data nodes (e.g. log data).
If the practitioner has undertaken a logical collection,
they should be able to match the metadata and files
collected to produce a complete representation of the data
of interest which has been stored. If, however, a practitioner chooses to undertake a physical collection, they will
most likely need to reconcile and reconstruct the individual
data objects into the original data stored by the user. This is
facilitated by the data collected at the metadata and,
possibly, the directory services stages. An in-depth technical understanding of the distributed filesystem is likely
required to reconstruct the files in a precise manner which
would be required as part of a forensic investigation. These
details may be discussed in the literature (in sources like
this paper) but a practitioner is advised to also conduct
their own experiments to enhance their own understanding and avoid any potential issues with software version
changes.
Conclusion
With the increasing digitalisation of data and use of
services such as cloud computing to process, store and
disseminate big data, there will be more opportunities for
exploitation of large datasets (e.g. in corporate or statesponsored espionage) and consequently, the continued
development of the digital forensic discipline is more
important than ever. An effective investigative process is
one that follows well-researched and documented processes, which allow digital forensic practitioners to be able
to identify and preserve, collect, examine and analyse
electronically stored information from information
communication technologies that would be admissible in a
court of law (Butler and Choo, 2013; Quick et al., 2014).
In this paper, we conducted an in-depth forensic
investigation of XtreemFS, a distributed filesystem that is
commonly implemented in cloud computing environments. Findings from our study contributed to a detailed
understanding of the both the technical and process issues
regarding collection of electronic evidence from distributed
filesystems generally. Based on our findings, we put forth
the following technical insights:
Directory services provide practitioners with an overview of the internal setup of the distributed filesystem.
This allows them to gain the necessary understanding of
the environment configuration (i.e. both physical and
logical) to undertake the collection of evidential data.
Metadata storage holds critical data that facilitates the
collection of valid and complete evidential data. Examples include temporal data which is useful in
Fig. 2. Our proposed distributed filesystem forensic process.
B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313 311reporting and determination of provenance in a
forensic context.
Data storage allows the practitioner to reconstruct the
files of interest stored in the distributed filesystem using
both environment data and metadata. This results in a
complete representation of the stored data of interest.
This research also highlighted the importance of a
forensically sound process such as our proposed distributed filesystem forensic process (see Fig. 2) in order to
provide clear guidance to digital forensic practitioners in
their investigation from evidence source identification and
preservation to collection of volatile, non-volatile and
network data to examining and analysing the preserved
data and reporting and presenting in a court of law.
Future work includes validating our framework and the
proposed process with other similar distributed filesystem
products such as GlusterFS, FhGFS and Ceph. Another
aspect of future work would be to develop forensic processes for cloud/distributed filesystems where APIs can be
used for object storage and retrieval (e.g. Amazon S3) using
a similar approach to the one presented in Martini and
Choo (2014).
Acknowledgements
The first author is supported by both the University of
South Australia and the Defence Systems Innovation
Centre. The views and opinions expressed in this article are
those of the authors alone and not the organisations with
whom the authors are or have been associated/supported.
References
Almulla S, Iraqi Y, Jones A. Cloud forensics: a research perspective. In: 9th
International conference on innovations in information technology
(IIT); 2013. p. 66e71.
BabuDB. Usage example Java e babudb e BabuDB usage in Java e an
embedded non-relational database for Java and Cþþ e Google Project
Hosting. ;
2012 [viewed 25.08.14].
Beath C, Becerra-Fernandez I, Ross J, Short J. Finding value in the information explosion. MIT Sloan Manag Rev 2012;53(4):18e20.
Beyer MA, Laney D. The importance of ‘big data’: a definition. G00235055.
Gartner; 2012.
Birk D, Wegener C. Technical issues of forensic investigations in cloud
computing environments. In: 6th International workshop on systematic approaches to digital forensic engineering (IEEE/SADFE);
2011. p. 1e10.
Butler A, Choo K-KR. IT standards and guides do not adequately prepare IT
practitioners to appear as expert witnesses: an Australian perspective. Secur J 2013.
Casonato R, Beyer MA, Adrian M, Friedman T, Logan D, Buytendijk F, et al.
Top 10 technology trends impacting information infrastructure, 2013.
G00249318. Gartner; 2013.
Ceph. Home Ceph. Inktank Storage; 2014. [viewed
25.08.14].
Cho C, Chin S, Chung KS. Cyber forensic for hadoop based cloud system.
Int J Secur Its Appl 2012;6(3):83e90.
Chonka A, Abawajy J. Detecting and mitigating hx-dos attacks against
cloud web services. In: 15th International conference on networkbased information systems (NBiS); 2012. p. 429e34.
Choo K-KR. Cloud computing: challenges and future directions. Trends
Issues Crime Crim Justice 2010;400:1e6.
Chua F. Digital Darwinism: thriving in the face of technology change.
; 2013
[viewed 29.11.13].
Chung H, Park J, Lee S, Kang C. Digital forensic investigation of cloud
storage services. Digit Investig 2012;9(2):81e95.
Contrail. Technology e contrail-project. ; 2013 [viewed 29.11.13].
Dukaric R, Juric MB. Towards a unified taxonomy and architecture of
cloud frameworks. Future Gener Comput Syst 2013;29(5):1196e210.
Dykstra J, Riehl D. Forensic collection of electronic evidence from
infrastructure-as-a-service cloud computing. Richmond J Law Technol
2013;XIX(1):1e47.
Dykstra J, Sherman AT. Design and implementation of FROST: digital
forensic tools for the OpenStack cloud computing platform. Digit
Investig 2013;10(s1):S87e95.
Enke H, Partl A, Reinefeld A, Schintke F. Handling big data in astronomy
and astrophysics: rich structured queries on replicated cloud data
with XtreemFS. Datenbank-Spektrum 2012/11/01;12(3):173e81.
Federici C. Cloud data imager: a unified answer to remote acquisition of
cloud storage areas. Digit Investig 2014;11(1):30e42.
Fraunhofer. BeeGFS/FhGFS e the Fraunhofer parallel cluster file system.
; 2014 [viewed 25.08.14].
Gellman B, Lindeman T. Inner workings of a top-secret spy program.
Wash Post 29 June 2013.
[viewed 25.11.13].
Gluster. Write once, read everywhere e Gluster. Red Hat 2014. [viewed 25.08.14].
Greenwald G, Ball J. The top secret rules that allow NSA to use US data
without a warrant. Guard 20 June 2013. [viewed
12.11.13].
Hale JS. Amazon cloud drive forensic analysis. Digit Investig 2013;10(3):
259e65.
Hay B, Nance K, Bishop M. Storm clouds rising: security challenges for
IaaS cloud computing. In: 44th Hawaii international conference on
system sciences (HICSS); 2011. p. 1e7.
Hegarty R, Merabti M, Shi Q, Askwith B. Forensic analysis of distributed
service oriented computing platforms. In: 12th Annual post graduate
symposium on the convergence of telecommunications, networking
and broadcasting; 2011.
Hooper C, Martini B, Choo K-KR. Cloud computing and its implications for
cybercrime investigations in Australia. Comput Law Secur Rev 2013;
29(2):152e63.
Info-communications Development Authority of Singapore. Big data
[chapter 4]. Co-creating the future: infocomm technology roadmap
2012. 2012. p. 1e52.
Kent K, Chevalier S, Grance T, Dang H. Guide to integrating forensic
techniques into incident response. SP800-86. Gaithersburg, MD, USA:
U.S. Department of Commerce; 2006.
Kielmann T, Pierre G, Morin C. XtreemOS: a sound foundation for cloud
infrastructure and federations. In: Desprez F, et al., editors. Grids, P2P
and services computing. Springer US; 2010. p. 1e5.
Kleineweber C, Reinefeld A, Schütt T. QoS-aware storage virtualization for
cloud file systems. In: 1st ACM international workshop on programmable file systems; 2014. p. 19e26.
Krüger J, Grunzke R, Herres-Pawlis S, Hoffmann A, de la Garza L,
Kohlbacher O, et al. Performance studies on distributed virtual
screening. BioMed Res Int 2014;2014.
Martini B, Choo K-KR. An integrated conceptual digital forensic framework for cloud computing. Digit Investig 2012;9(2):71e80.
Martini B, Choo K-KR. Cloud storage forensics: ownCloud as a case study.
Digit Investig 2013;10(4):287e99.
Martini B, Choo K-KR. Remote programmatic vCloud forensics: a six-step
collection process and a proof of concept. In: 13th International
conference on trust, security and privacy in computing and communications, Beijing, China; 2014. p. 935e42.
Marty R. Cloud application logging for forensics. In: 2011 ACM symposium on applied computing; 2011. p. 178e84.
McKemmish R. What is forensic computing? Trends Issues Crime Crim
Justice 1999;118:1e6.
National Institute of Standards and Technology. Challenging security requirements for US Government cloud computing adoption (Draft).
Gaithersburg: U.S. Department of Commerce; 2011.
Patel A, Taghavi M, Bakhtiyari K, Júnior JC. An intrusion detection and
prevention system in cloud computing: a systematic review. J Netw
Comput Appl 2013;36(1):25e41.
Petcu D, Macariu G, Panica S, Craciun C. Portable cloud applications dfrom theory to practice. Future Gener Comput Syst 2013;29(6):
1417e30.
Pierre G, Stratan C. ConPaaS: a platform for hosting elastic cloud applications. IEEE Internet Comput 2012;16(5):88e92.
312 B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313Quick D, Martini B, Choo K-KR. Cloud storage forensics. 1st ed. Waltham,
MA, USA: Syngress, an imprint of Elsevier; 2014.
Stender J, Berlin M, Reinefeld A. XtreemFSda file system for the cloud.
Data intensive storage services for cloud environments. IGI Global
Press; 2012. p. 267e85.
United States Foreign Intelligence Surveillance Court. Memorandum
opinion. ; 2011 [viewed 29.11.13].
United States National Institute of Justice (US NIJ). Forensic examination
of digital evidence: a guide for law enforcement. Washington, DC: U.S.
Department of Justice; 2004.
United States National Intelligence Council. Global trends 2030: alternative worlds. ; 2012 [viewed 29.11.13].
XtreemFS. XtreemFS e fault-tolerant distributed file system. ; 2013 [viewed 12.11.13].
Zatyko K, Bay J. The digital forensics cyber exchange principle. Forensic
Mag 2012:13e5. December 2011eJanuary 2012.
B. Martini, K.-K.R. Choo / Digital Investigation 11 (2014) 295e313 313