Technical
Important: A new DDS knowledge base wiki has been created to collate the large amounts of detailed technical information (for example, specifications, documents, and schemas) that are currently available.
See wiki.discoverydataservice.org for more information.
This section describes the high level technical details of the Discovery Data Service; the architecture that underpins the data service software, and the testing and assurance processes.
For details see:
Architecture
The following diagram explains the underlying architecture and the basic publication to subscription pathways that data must follow.
Publisher data is identified by organisation and software format/version. For example, EMIS CSV 5.6 or Adastra XLM 1.0.
Note: It is assumed that published data will be accompanied by an ODSOrganisation Data Service (NHS) code, or similar.
Important: Existing data processing and sharing agreements are checked to validate that the DDS has permission to process data from that organisation and in that format/version, and then share that data with specified subscribers.
Published data
See the DDS knowledge base wiki - current published data for more details.
Current data sets
See the DDS knowledge base wiki - current data sets for more details.
Data assurance & testing
The Discovery Data Service receives data from publisher organisations that could have:
- Multiple IT systems that are responsible for capturing the data.
- Multiple mechanisms of publishing the data to the DDS.
- Several message or file formats.
- Multiple content taxonomies or code schemes, some standard and some local, and some with no code schemes at all.
This many-to-many relationship between the technical data exchange formats and an individual patient record creates a significant challenge when we try to aggregate and link the data for direct care and secondary uses.
The approach that we have taken consists of testing each and every system, extract mechanism, format, taxonomy and code scheme independently against clinical or operational scenarios to make sure that the system is fit for purpose.
It should be noted that when data is moved from one system to another it always loses some data or some context; this is referred to as degrade. The objective of the assurance process is to prove that the mechanism involved in transferring the data is fit for purpose and that the data content is good enough for the subscriber use cases.
The starting point of the overall assurance process is a publisher data entry scenario, and the end point is a subscriber data usage scenario; test scenarios and test packs are derived from the overall specification, modified by the system extract capability, and narrowed to the scenario of interest. There is no requirement to test the entire data service at any point.
Integral to the Discovery Data Service is a comprehensive set of tools for monitoring all access and configuration changes on our cloud platform, we log everything a user does from sign in/sign out to any change to a server or configuration item to create a complete audit trail. Access rules are continually reviewed with only minimal permissions applied unless a change is scheduled to be made. All access is governed by the Clinical Effectiveness Group (CEG) Barts & the London Queen Mary University Information Security Policy.
Notifications are generated as data is received at any of our endpoints, this data is then processed into the service with audit logs generated along the way to ensure accuracy and consistency.
Our data testing and assurance team have developed an advanced audit log, which is implemented in every transform, meaning that we can view every transform result to make sure that all data published into the DDS is validated at every step in the process and to allow technical and clinical teams to validate and sign off the accuracy of the data feed.
For technical and API queries, please contact info@discoverydataservice.org