The PHOENIX filesystem

This guide is a description of the PHOENIX filesystem, not necessarily that of any particular software implementation that produces a PHOENIX filesystem.

GENERAL and PROTECTED folders

At the root level of the PHOENIX file system, there are two subdirectories GENERAL and PROTECTED

PHOENIX
├── GENERAL
└── PROTECTED

Data that are not encrypted at rest are stored under the GENERAL folder. Data that are encrypted at rest are stored under the PROTECTED folder.

Note

Types of data that are encrypted at rest tend to include gps, onsite interviews, and voice recordings.

Study folders

Under the GENERAL and PROTECTED folders are STUDY folders

PHOENIX
├── GENERAL
│   └── STUDY_A
└── PROTECTED
    └── STUDY_A

Note

Study folders should contain only letters, numbers, and underscores [A-Za-z0-9_].

study folder permissions

Each STUDY folder is assigned the default permissions rwx------. Individual user permissions are then added using POSIX.1e access control lists. To add read (ls) and execute (cd) permissions on the STUDY_A folder for user jdoe you would issue the following command

setfacl -m u:jdoe:rx /PHOENIX/GENERAL/STUDY_A /PHOENIX/PROTECTED/STUDY_A

Warning

Many but not all filesystems support POSIX.1e access control lists. For example, some versions of PANASAS do not support them at all, while other filesystems, such as NFSv4, may use different tools and/or a modified syntax than shown above.

Subject folders

Within each STUDY folder are individual SUBJECT folders. Subject names should be unique across PHOENIX

PHOENIX
├── GENERAL
│   └── STUDY_A
│       └── SUBJECT_1
└── PROTECTED
    └── STUDY_A
        └── SUBJECT_1

Warning

While subject names should be unique across PHOENIX, this is not enforced by Lochness in any way.

Note

Subject names should contain only letters, numbers, and underscores [A-Za-z0-9_].

Data type folders

Within each SUBJECT folder, there are folders for each DATA TYPE

PHOENIX
├── GENERAL
│   └── STUDY_A
│       └── SUBJECT_1
│           └── DATA_TYPE
└── PROTECTED
    └── STUDY_A
        └── SUBJECT_1
            └── DATA_TYPE

Some example DATA TYPE names include actigraphy, mri, phone, and surveys.

Note

Data type names should contain only letters, numbers, and underscores [A-Za-z0-9_].

Raw and processed folders

Within each DATA TYPE folder, there are folders for raw and processed data

PHOENIX
├── GENERAL
│   └── STUDY_A
│       └── SUBJECT_1
│           └── DATA_TYPE
│               ├── raw
│               └── processed
└── PROTECTED
    └── STUDY_A
        └── SUBJECT_1
            └── DATA_TYPE
                ├── raw
                └── processed

raw

The raw folders are the bedrock of the PHOENIX filesystem. These folders are typically populated by data aggregation software. The user designated to run the data aggregation software should be the only user with write permissions on these folders. All other users should be granted read-only permissions.

processed

The processed folders are assigned the permissions rwxrwxrwxt which allows any user who has been granted access to the parent STUDY folder to write files. Because these folders use a sticky bit, only the owner of a file will be allowed to edit or delete their own files.

Note

Folders must be named raw and processed in lowercase letters.

Product folders (optional)

Within each raw folder, there may be folders for each data capturing PRODUCT. This allows for multiple data capturing products, which happen to capture the same type of data, to be clearly differentiated

PHOENIX
├── GENERAL
│   └── STUDY_A
│       └── SUBJECT_1
│           └── DATA_TYPE
│               └── raw
│                   └── PRODUCT
│
└── PROTECTED
    └── STUDY_A
        └── SUBJECT_1
            └── DATA TYPE
                └── raw
                    └── PRODUCT

Some product names include Actiwatch2 and GENEActiv.

Note

Product names should contain only letters, numbers, and underscores [A-Za-z0-9_].

Raw file integrity

As raw files are being downloaded from each data source, the file contents are stored within hidden files. These hidden files should be ignored by end users. The file will be renamed to a visible file name only after the file has been considered downloaded successfully. If the file contents can be verified using a checksum, numbers of bytes, or by some other means, the file will be verified before it is made visible.

Raw file naming convention

As a general rule, files will always preserve their original names or they will be assigned a name provided by the originating data source. Instances where a file name is not provided by the originating data source, an appropriately descriptive file name will be automatically generated.

Metadata files

In PHOENIX, all data for a subject are downloaded and organized under unique SUBJECT folders. To accomplish this, the data aggregation software must understand how to query for data belonging to the SUBJECT within each data source. This is achieved using metadata files. Each STUDY folder must contain a metadata file

PHOENIX
└── GENERAL
    └── STUDY_A
        └── STUDY_A_metadata.csv

The metadata file should be named with the study name followed by a _metadata.csv suffix. The data aggregator is largely driven off these PHOENIX metadata files. The minimal contents of a metadata file should look like this

Active,Subject ID,Consent Date
1,SUBJECT_1,2019-01-01

For convenience, here’s the same file rendered as a table

Active Subject ID Consent
1 SUBJECT_1 2019-01-01

You must add additional columns to this file for each supported data source that you wish to pull data from.

See also

You can read much more about the supported data sources on the data sources page.

For the sake of brevity, let’s see what a metadata file looks like when we add a Beiwe column

Active Subject ID Consent Beiwe
1 SUBJECT_1 2019-01-01 beiwe.example:5e2311:abcde

This instructs the data aggregator that SUBJECT_1 should have data in the Beiwe instance beiwe.example, under the study 5e2311, under the patient abcde.