Skip to content

Note: Data integrity systems for WhoWasInCommand.com

Tom Longley edited this page Sep 24, 2018 · 1 revision

Note: Data integrity systems for WhoWasInCommand.com

By TL for DataMade / 20180514

What are we trying to do?

The ongoing nature of our research means that there will always be more to do, and areas of our dataset that could be better - however, this does not diminish the value of the dataset or prevent us from producing analysis. However, it does mean that we need to keep complete integrity of our dataset at different points in time, enabling us to:

  • Expose and demonstrate the progressive improvement in coverage and quality of our dataset; and,
  • Discover, correct and preserve our mistakes.

To realise progressive improvement we wish to implement a review system, where new additions and updates to records are checked by a second reviewer for accuracy. It also means we must have a bias in favour of not deleting records that have mistakes, but either correcting them or - where we just got something wrong that can’t be fixed - removing them from inclusion in future versions of the dataset. To make this work in practice, we need to look at the interplay between:

  • Data capture and removal processes;
  • Version control systems; and,
  • Public access to the data.

What are the basic rules of the system?

  • Guests can only access records that are reviewed and published. Access means viewing records and downloading data.
  • When a completely new record is created, it must be both reviewed and published before a guest can access it
  • Updates to existing records are stored as new versions of the record and placed in a staging area along with a “review request”. Guests cannot see updates that are placed in the staging area.
  • Review requests show a reviewer the proposed changes to the records, which can be either approved or rejected by SFM team members
  • If a review request is approved, the new version of the existing record can be published and the public view of the record will be updated with the new changes.
  • Where records are not published, it is likely they are just “works in progress” (or “scratch, “stub” records).
  • Where a review request is rejected, the updated version remains staged but unpublished. The version of the record visible to guests will not change.
  • Any record can be unpublished, which removes Guest access to that record.
  • Any record - whether published or unpublished - can be placed in the trash, which automatically un-publishes it and removes it from Guest access.
  • “Trashed” records are still visible to SFM but will not be included in search results unless specified (an “Include trashed results” option will be provided). “Trash” is a soft-delete feature the use of which indicates our intention to remove a record whilst respecting that it had a place in our data at a point in time.
  • Records placed in the trash can at any time be restored to the staging area, where they will need to be re-published before Guest access is restored.
  • In rare cases, we will need to completely remove all trace of a record from our dataset using a “Hard delete” feature. This will only be done by an administrator, and will include safeguards to protect the integrity of the dataset.
  • A group of unpublished, unreviewed records can be grouped together as a project. This feature will enables us to work privately on a set of data, and provide a hook for making changes to publication status.

How does this work in common use cases?

Creating a new record

  • Tom creates a new record.
  • The new record is by default unpublished, and a guest cannot access any version of the new record.
  • Michel reviews the record and approve it, creating a new version of the record indicating Michel has reviewed it
  • Michel or Tom publish the record
  • A guest can access this version of the record, and see both the version that Tom created and the version indicating that Michel approved it

Updating a record

  • Tom makes a change to a published record, creating a new version containing the changes
  • Because the record is published, a guest can still view older versions of the record but not the new version with the changes Tom made
  • Michel approves the changes Tom made, creating a new version of the record indicating that Michel has approved of the changes, which has the effect of automatically updating the published record
  • A guest can now access the record with the changes Tom made.

Un-publishing a record

  • Tom un-publishes a record
  • Michel can still access the record
  • A guest can no longer access any version of the record

Updating an un-published record

  • Tom makes a change to an un-publish record, creating a new version of the record with the changes in it
  • Placing a record in the Trash (soft delete)
  • Tom places a published record in the Trash
  • The record is un-published
  • The record is flagged as Trash, and will still appear in search results for Tom and Michel
  • Michel and Tom can still access the record
  • A guest can no longer access any version of the record

Restoring a record from the Trash (un-doing a soft delete)

  • Tom restores a record from the Trash
  • A guest cannot access any version of the restored record, as the record is unpublished
  • Tom or Michel can publish the record
  • A guest can now access any version of the restored record

Deleting an record (hard delete)

  • Only an administrator can hard delete a record, which means removing all versions and any other trace of the record from WhoWasInCommand
  • Before a hard delete can be actioned, WhoWasInCommand will show the administrator which the other records that are impacted by the hard delete.
  • Linked records that are dependencies for other records, will not be deleted.
  • Linked records that are unique dependent on the record flagged for hard delete will also be deleted.

Creating a new project

  • Tom is starting work updating the Kenya Police Force and needs to work on a lot of records at once, but wishes to keep them all unpublished
  • Tom can assign an arbitrary (but unique) tag to these records, enabling their quite retrieval and for other features to operate on them as a group (publish, approve, trash)
  • After completing research, Tom submits all records for review.
  • Michel approves all the review requests and publishes the complete set of records at once.

What tools will we need to implement this?

The “Dashboard” area should have the following features:

An “eyeball” tool

A single, fast performing, information-dense, single tabular view of all records that enables the user to:

  • Quickly find to any record
  • See all records that have active review requests
  • Jump to an editable view of a record to complete a review requests
  • Query all records by version metadata including project, date/time of most recent update (approved review requests), reviewer name, publication status
  • Perform batch approve, publish and trash actions on multiple records

Activity log

An immutable log of every action CRUD taken by SFM staff on the website, sortable by date, user, and action type.

Trash can

  • View, sort and facet records that have been placed in the trash can
  • Batch restore one or numerous records from the trash can.
  • Initiate a “hard delete” on any single item in the trash can.