Paper Review: the Babylon Chatbot

June 29, 2018 § Leave a comment

[For convenience I collect here and slightly rearrange and update my Twitter review of a recent paper comparing the performance of the Babylon chatbot against human doctors. As ever my purpose is to focus on the scientific quality of the paper, identify design weaknesses, and suggest improvements in future studies.]


Here is my peer review of the Babylon chatbot as described in the conference paper at

Please feel free to correct any misunderstandings I have of the evaluation in the tweets that follow.

To begin, the Babylon engine is a Bayesian reasoner. That’s cool. Not sure if it qualifies as AI.

The evaluation uses artificial patient vignettes which are presented in a structured format to human GPs or a Babylon operator. So the encounter is not naturalistic. It doesn’t test Babylon in front of real patients.

In the vignettes, patients were played by GPs, some of whom were employed by Babylon. So they might know how Babylon liked information to be presented and unintentionally advantaged it. Using independent actors, or ideally real patients, would have had more ecological validity.

A human is part of the Babylon intervention because a human has to translate the presented vignette and enter it into Babylon.  In other words, the human heard information that they then translated into terminology Babylon recognises. The impact of this human is not explicitly measured. For example, rather than being a ‘mere’ translator, the human may occasionaly have had to make clinical inferences to match case data to Babylon capabilty. The knowledge to do that would thus be be external to Babylon and yet contribute to its performance, if true.

The vignettes were designed to test know capabilities of the system. Independently created vignettes exploring other diagnoses would likely have resulted in a much poorer performance. This tests Babylon on what it knows not what it might find ‘in the wild’.

It seems the presentation of information was in the OSCE format, which is artificial and not how patients might present. So there was no real testing of consultation and listening skills that would be needed to manage a real world patient presentation.

Babylon is a Bayesian reasoner but no information was presented on the ‘tuning’ of priors required to get this result. This makes replication hard. A better paper would provide the diagnostic models to allow independent validation.

The quality of differential diagnoses by humans and Babylon was assessed by one independent individual. In addition two Babylon employees also rated differential diagnosis quality. Good research practice is to use multiple independent assessors and measure inter-rater reliability.

The safety assessment has the same flaw. Only 1 independent assessor was used and no inter-rater reliability measures are presented when several in-house assessors are added. Non-independent assessors bring a risk of bias.

To give some further evaluation, additional vignettes are used, based on MRCGP tests. However any vignettes outside of the Babylon system’s capability were excluded. They only tested Babylon on vignettes it had a chance to get right.

So, whilst it might be ok to allow Babylon to only answer questions that it is good at for limited testing, the humans did not have a reciprocal right to exclude vignettes they were not good at. This is fundamental bias in the evaluation design.

A better evaluation model would have been to draw a random subset of cases and present them to both GPs and Babylon.

No statistical testing is done to check if the differences reported are likely due to chance variation. A statistically rigorous study would estimate the likely effect size and use that to determine the sample size needed to detect a difference between machine and human.

For the first evaluation study, methods tells us “The study was conducted in four rounds over consecutive days. In each round, there were up to four “patients” and four doctors.” That should mean each doctor and Babylon should have seen “up to” 16 cases.

Table 1 shows Babylon used on 100 vignettes and doctors typically saw about 50. This makes no sense. Possibly they lump in the 30 Semigran cases reported separately but that still does not add up. Further as the methods for Semigran were different they cannot be added in any case.

There is a problem with Doctor B who completes 78 vignettes. The others do about 50. Further looking at Table 1 and Fig 1 Doctor B is an outlier, performing far worse than the others diagnostically. This unbalanced design my mean average doctor performance is penalised by Doctor B or the additional cases they saw.

Good research practice is to report who study subjects are and how they were recruited. All we know is that these were locum GPs paid to do the study. We should be told their age, experience and level of training, perhaps where they were trained, whether they were independent or had a prior link to the researchers doing the study etc. We would like to understand if B was somehow “different” because their performance certainly was.

Removing B from the data set and recalculating results shows humans beating Babylon on every measure in Table 1.

With such a small sample size of doctors, the results are thus very sensitive to each individual case and doctor, and adding or removing a doctor can change the outcomes substantially. That is why we need statistical testing.

There is also a Babylon problem. It sees on average about twice as many cases as the doctors. As we have no rule provided for how the additional cases seen by Babylon were selected, there is a risk of selection bias eg what if by chance the ‘easy’ cases were only seen by Babylon?

A better study design would allocate each subject exactly the same cases, to allow meaningful and direct performance comparison. To avoid known biases associated with presentation order of cases, case allocation should be random for each subject.

For the MRCGP questions Babylon’s diagnostic accuracy is measured by its ability to identify a disease within its top 3 differential diagnoses. It identified the right diagnosis in its top 3 in 75% of  36 MRCGP CSA vignettes, and 87% of 15 AKT vignettes.

For the MRCGP questions we are not given Babylon’s performance when the measure is the top differential. Media reports compare Babylon against historical MRCGP human results. One assumes humans had to produce the correct diagnosis, and were not asked for a top 3 differential.

There is huge significance clinically in putting a disease in your top few differential diagnoses and the top one you elect to investigate. It also is an unfair comparison if Babylon is rated by a top 3 differential and humans by a top 1. Clarity on this aspect would be valuable.

In closing, we are only ever given one clear head to head comparison between humans and Babylon, and that is on the 30 Semigran cases. Humans outperform Babylon when the measure is the top diagnosis. Even here though, there is no statistical testing.

So, in summary, this is a very preliminary and artificial test of a Bayesian reasoner on cases for which it has already been trained.

In machine learning this would be roughly equivalent to in-sample reporting of performance on the data used to develop the algorithm. Good practice is to report out of sample performance on previously unseen cases.

The results are confounded by artificial conditions and use of few and non-independent assessors.

There is lack of clarity in the way data are analysed and there are numerous risks of bias.

Critically, no statistical testing is performed to tell us whether any of the differences seen mean anything. Further, the small sample size of GPs tested likely means it would be unlikely that this study was adequately powered to see any difference, if it does exist.

So, it is fantastic that Babylon has undertaken this evaluation, and has sought to present it in public via this conference paper. They are to be applauded for that. One of the benefits of going public is that we can now provide feedback on the study’s strength and weaknesses.



Evidence-based health informatics

February 11, 2016 § 6 Comments

Have we reached peak e-health yet?

Anyone who works in the e-health space lives in two contradictory universes.

The first universe is that of our exciting digital health future. This shiny gadget-laden paradise sees technology in harmony with the health system, which has become adaptive, personal, and effective. Diseases tumble under the onslaught of big data and miracle smart watches. Government, industry, clinicians and people off the street hold hands around the bonfire of innovation. Teeth are unfeasibly white wherever you look.

The second universe is Dickensian. It is the doomy world in which clinicians hide in shadows, forced to use clearly dysfunctional IT systems. Electronic health records take forever to use, and don’t fit clinical work practice. Health providers hide behind burning barricades when the clinicians revolt. Government bureaucrats in crisp suits dissemble in velvet-lined rooms, softly explaining the latest cost overrun, delay, or security breach. Our personal health files get passed by street urchins hand-to-hand on dirty thumbnail drives, until they end up in the clutches of Fagin like characters.

Both of these universes are real. We live in them every day. One is all upside, the other mostly down. We will have reached peak e-health the day that the downside exceeds the upside and stays there. Depending on who you are and what you read, for many clinicians, we have arrived at that point.

The laws of informatics

To understand why e-health often disappoints requires some perspective and distance. Informed observers again and again see the same pattern of large technology driven projects sucking up all the e-health oxygen and resources, and then failing to deliver. Clinicians see that the technology they can buy as a consumer is more beautiful and more useful that anything they encounter at work.

I remember a meeting I attended with Branko Cesnik. After a long presentation about a proposed new national e-health system, focusing entirely on technical standards and information architectures, Branko piped up: “Excuse me, but you’ve broken the first law of informatics”. What he meant was that the most basic premise for any clinical information system is that it exists to solve a clinical problem. If you start with the technology, and ignore the problem, you will fail.

There are many corollary informatics laws and principles. Never build a clinical system to solve a policy or administrative problem unless it is also solving a clinical problem. Technology is just one component of the socio-technical system, and building technology in isolation from that system just builds an isolated technology [3].

Breaking the laws of informatics

So, no e-health project starts in a vacuum of memory. Rarely do we need to design a system from first principles. We have many decades of experience to tell us what the right thing to do is. Many decades of what not to do sits on the shelf next to it. Next to these sits the discipline of health informatics itself. Whilst it borrows heavily from other disciplines, it has its own central reason to exist – the study of the health system, and of how to design ways of changing it for the better, supported by technology. Informatics has produced research in volume.

Yet today it would be fair to say that most people who work in the e-health space don’t know that this evidence exists, and if they know it does exist, they probably discount it. You might hear “N of 1” excuse making, which is the argument that the evidence “does not apply here because we are different” or “we will get it right where others have failed because we are smarter”. Sometimes system builders say that the only evidence that matters is their personal experience. We are engineers after all, and not scientists. What we need are tools, resources, a target and a deadline, not research.

Well, you are not different. You are building a complex intervention in a complex system, where causality is hard to understand, let alone control. While the details of your system might differ, from a complexity science perspective, each large e-health project ends up confronting the same class of nasty problem.

The results of ignoring evidence from the past are clear to see. If many of the clinical information systems I have seen were designed according to basic principles of human factors engineering, I would like to know what those principles are. If most of today’s clinical information systems are designed to minimize technology-induced harm and error, I will hold a party and retire, my life’s work done.

The basic laws of informatics exist, but they are rarely applied. Case histories are left in boxes under desks, rather than taught to practitioners. The great work of the informatics research community sits gathering digital dust in journals and conference proceedings, and does not inform much of what is built and used daily.

None of this story is new. Many other disciplines have faced identical challenges. The very name Evidence-based Medicine (EBM), for example, is a call to arms to move from anecdote and personal experience, towards research and data driven decision-making. I remember in the late ‘90s, as the EBM movement started (and it was as much a social movement as anything else), just how hard the push back was from the medical profession. The very name was an insult! EBM was devaluing the practical, rich daily experience of every doctor, who knew their patients ‘best’, and every patient was ‘different’ to those in the research trials. So, the evidence did not apply.

EBM remains a work in progress. All you need to do today is to see a map of clinical variation to understand that much of what is done remains without an evidence base to support it. Why is one kind of prosthetic hip joint used in one hospital, but a different one in another, especially given the differences in cost, hip failure and infection? Why does one developed country have high caesarian section rates when a comparable one does not? These are the result of pragmatic ‘engineering’ decisions by clinicians – to attack the solution to a clinical problem one way, and not another.  I don’t think healthcare delivery is so different to informatics in that respect.

Is it time for evidence-based health informatics?

It is time we made the praxis of informatics evidence-based.

That means we should strive to see that every decision that is made about the selection, design, implementation and use of an informatics intervention is based on rigorously collected and analyzed data. We should choose the option that is most likely to succeed based on the very best evidence we have.

For this to happen, much needs to change in the way that research is conducted and communicated, and much needs to happen in the way that informatics is practiced as well:

  • We will need to develop a rich understanding of the kinds of questions that informatics professionals ask every day;
  • Where the evidence to answer a question exists, we need robust processes to synthesize and summarize that evidence into practitioner actionable form;
  • Where the evidence does not exist and the question is important, then it is up to researchers to conduct the research that can provide the answer.

In EBM, there is a lovely notion that we need problem oriented evidence that matters (POEM) [1] (covered in some detail in Chapter 6 of The Guide to Health Informatics). It is easy enough to imagine the questions that can be answered with informatics POEMs:

  • What is the safe limit to the number of medications I can show a clinician in a drop-down menu?
  • I want to improve medication adherence in my Type 2 Diabetic patients. Is a text message reminder the most cost-effective solution?
  • I want to reduce the time my docs spend documenting in clinic. What is the evidence that an EHR can reduce clinician documentation time?
  • How gradually should I roll out the implementation of the new EHR in my hospital?
  • What changes will I need to make to the workflow of my nursing staff if I implement this new medication management system?

EBM also emphasises that the answer to any question is never an absolute one based on the science, because the final decision is also shaped by patient preferences. A patient with cancer may choose a treatment that is less likely to cure them, because it is also less likely to have major side-effects, which is important given their other goals. The same obviously holds in evidence-based health informatics (EBHI).

The Challenges of EBHI

Making this vision come true would see some significant long term changes to the business of health informatics research and praxis:

  • Questions: Practitioners will need develop a culture of seeking evidence to answer questions, and not simply do what they have always done, or their colleagues do. They will need to be clear about their own information needs, and to be trained to ask clear and answerable questions. There will need to be a concerted partnership between practitioners and researchers to understand what an answerable question looks like. EBM has a rich taxonomy of question types and the questions in informatics will be different, emphasizing engineering, organizational, and human factors issues amongst others. There will always be questions with no answer, and that is the time experience and judgment come to the fore. Even here though, analytic tools can help informaticians explore historical data to find the best historical evidence to support choices.
  • Answers: The Cochrane Collaboration helped pioneer the development of robust processes of meta-analysis and systematic review, and the translation of these into knowledge products for clinicians. We will need to develop a new informatics knowledge translational profession that is responsible for understanding informatics questions, and finding methods to extract the most robust answers to them from the research literature and historical data. As much of this evidence does not typically come from randomised controlled trials, other methods than meta-analysis will be needed. Case libraries, which no doubt exist today, will be enhanced and shaped to support the EBHI enterprise. Because we are informaticians, we will clearly favor automated over manual ways of searching for, and summarizing, the research evidence [2]. We will also hopefully excel at developing the tools that practitioners use to frame their questions and get the answers they need. There are surely both public good and commercial drivers to support the creation of the knowledge products we need.
  • Bringing implementation science to informatics: We know that informatics interventions are complex interventions in complex systems, and that the effect of these interventions vary depending on the organisational context. So, the practice of EBHI will of necessity see answers to questions being modified because of local context. I suspect that this will mean that one of the major research challenges to emerge from embracing EBHI is to develop robust and evidence-based methods to support localization or contextualisation of knowledge. While every context is no doubt unique, we should be able to draw upon the emerging lessons of implementation science to understand how to support local variation in a way that is most likely to see successful outcomes.
  • Professionalization: Along with culture change would come changes to the way informatics professionals are accredited, and reaccredited. Continuing professional education is a foundation of the reaccreditation process, and provides a powerful opportunity for professionals to catch up with the major changes in science, and how those changes impact the way they should approach their work.


There comes a moment when surely it is time to declare that enough is enough. There is an unspoken crisis in e-health right now. The rhetoric of innovation, renewal, modernization and digitization make us all want to believers. The long and growing list of failed large-scale e-health projects, the uncomfortable silence that hangs when good people talk about the safety risks of technology, make some think that e-health is an ill-conceived if well intentioned moment in the evolution of modern health care. This does not have to be.

To avoid peak e-health we need to not just minimize the downside of what we do by avoiding mistakes. We also have to maximize the upside, and seize the transformative opportunities technology brings.

Everything I have seen in medicine’s journey to become evidence-based tells me that this will not be at all easy to accomplish, and that it will take decades. But until we do, the same mistakes will likely be rediscovered and remade.

We have the tools to create a different universe. What is needed is evidence, will, a culture of learning, and hard work. Less Dickens and dystopia. More Star Trek and utopia.

Further reading:

Since I wrote this blog a collection of important papers covering the important topic of how we evaluate health informatics and choose which technologies are fit for purpose has been published in the book Evidence-based Health Informatics.


  1. Slawson DC, Shaughnessy AF, Bennett JH. Becoming a medical information master: feeling good about not knowing everything. The Journal of Family Practice 1994;38(5):505-13
  2. Tsafnat G, Glasziou PP, Choong MK, et al. Systematic Review Automation Technologies. Systematic Reviews 2014;3(1):74
  3. Coiera E. Four rules for the reinvention of healthcare. BMJ 2004;328(7449):1197-99


 An Italian translation of this article is available


Clinical communication, interruption and the design of e-health

September 23, 2014 § Leave a comment

The different ways clinicians interact does not just shape the success of the communication act. Our propensity to interrupt each other, and multitask as we handle communication tasks alongside other duties, has a direct effect on how well we carry out everything we do. Interruption for example has the capacity to distort human memory processes, and lead to memory lapses as well as memory distortions.

Earlier this year I was interviewed by Dr. Robert Wachter, the Editor of the Agency for Healthcare Research and Quality (AHRQ) WebM&M. In that interview we covered the roles that interruption and multitasking play in patient safety, discussing both their risks, as well as strategies for minimising their effects. The interview also looked at the implications these communication and task management styles have for the design of information technologies.

The transcript of the interview as well as the podcast are available here.

A related 2012 editorial on the research challenges associated with interruption appeared in BMJ Quality and safety.

It is clear that our clinical information systems are not designed to be used in busy, interrupt-driven environments, and that they suffer because of it. Not only do they not fit the way people of necessity communicate and work, they lead to additional risks and have the potential to harm patients. It perplexes me that information system designers still work on the blind assumption that their users are giving their full attention to the software systems they have built. E-health systems need to be tolerant of interruption, and must be designed to support recovery from such events. Memory prompts, task markers, and retention of context once an action has been completed, are essential for the safe design of e-health systems.


Clinical Safety of the Australian Personally Controlled Electronic Heath Record (PCEHR)

November 29, 2013 § Leave a comment

Like many nations, Australia has begun to develop nation-scale E-health infrastructure. In our case it currently takes the form of a Personally Controlled Electronic Health Record (PCEHR). It is a Federal government created and operated  approach that caches clinical documents including discharge summaries, summary care records that are electively created, uploaded and maintained by a designated general practitioner, and some administrative data including information on medications prescribed and dispensed, as well as claims data from Medicare, our universal public health insurer. It is personally-controlled in that consumers must elect to opt in to the system, and may elect to hide documents or data should they not wish them to be seen – a point that has many clinicians concerned but is equally celebrated by consumer groups.

With ongoing concerns in some sectors about system cost, apparent low levels of adoption and clinical use, as well as a change in government, the PCEHR is now being reviewed to determine its fate. International observers might detect echoes of recent English and Canadian experiences here, and we will all watch with interest to see what unfolds after the Committee reports.

My submission to the PCEHR Review Committee focuses only on the clinical safety of the system, and the governance processes designed to ensure clinical safety. You can read my submission here.

As background to the submission, I have written about clinical safety of IT in health care for several years now, and some of the more relevant papers are collected here:

  1. J. S. Ash, M. Berg, E. Coiera, Some Unintended Consequences of Information Technology in Health Care: The Nature of Patient Care Information System Related Errors, Journal American Medical Informatics Association, 11(2),104-112, 2004.
  2. Coiera E, Westbrook J, Should clinical software be regulated? Medical Journal of Australia. 2006:184(12);600-1.
  3. Coiera E, Westbrook JI, Wyatt J (2006) The safety and quality of decision support systems, Methods Of Information In Medicine 45: 20-25 Suppl. 1, 2006.
  4. Magrabi F, Coiera E. Quality of prescribing decision support in primary care: still a work in progress. Medical Journal of Australia 2009; 190 (5): 227-228.
  5. Coiera E, Do we need a national electronic summary care record? Medical Journal of Australia 2011; 194(2), 90-2.
  6. [Paywall] Coiera E, Kidd M, Haikerwal M, A call to national e-health clinical safety governance, Med J Aust 2012; 196 (7): 430-431.
  7. Coiera E, Aarts J, Kulikowski K. The Dangerous Decade. Journal of the American Medical Informatics Association, 2012;19(1):2-5.
  8. [Paywall] Magrabi F, Aarts J, Nohr C, et al. A comparative review of patient safety initiatives for national health information technology. International journal of medical informatics 2012;82(5):e139-48.
  9. [Paywall] Coiera E, Why E-health is so hard, Medical Journal of Australia, 2013; 198(4),178-9.

Along with research colleagues I have been working on understanding the nature and extent of likely harms, largely through reviews of reported incidents in Australia, the US and England. A selection of our papers can be found here:

  1. Magrabi F, Ong M, Runciman W, Coiera E. An analysis of computer-related patient safety incidents to inform the development of a classification. Journal of the American Medical Informatics Association 2010;17:663-670.
  2. Magrabi F, Li SYW, Day RO, Coiera E, Errors and electronic prescribing: a controlled laboratory study to examine task complexity and interruption effects. Journal of the American Medical Informatics Association 2010 17: 575-583.
  3. Magrabi F, Ong, M, Runciman W, Coiera E, Patient Safety Problems Associated with Healthcare Information Technology: an Analysis of Adverse Events Reported to the US Food and Drug Administration, AMIA 2011 Annual Symposium, Washington DC, October 2011, 853-8.
  4. Magrabi F, Ong MS, Runciman W, Coiera E. Using FDA reports to inform a classification for health information technology safety problems. Journal of the American Medical Informatics Association 2012;19:45-53.
  5. Magrabi, F., M. Baker, I. Sinha, M.S. Ong, S. Harrison, M. R. Kidd, W. B. Runciman and E. Coiera. Clinical safety of England’s national programme for IT: A retrospective analysis of all reported safety events 2005 to 2011. International Journal of Medical Informatics 2015;84(3): 198-206.

Submission to the PCEHR Review Committee 2013

November 29, 2013 § 4 Comments

Professor Enrico Coiera, Director Centre for Health Informatics, Australian Institute of Health Innovation, UNSW

Date: 21 November 2013

The Clinical Safety of the Personally Controlled Electronic Health Record (PCEHR)

This submission comments on the consultations during PCEHR development, barriers to clinician and patient uptake and utility, and makes suggestions to accelerate adoption. The lens for these comments is patient safety.

The PCEHR like any healthcare technology may do good or harm. Correct information at a crucial moment may improve care. Misleading, missing or incorrect information may lead to mistakes and harm. There is clear evidence nationally and internationally that health IT can cause such harm [1-5].

To mitigate such risks, most industries adopt safety systems and processes at software design, build, implementation and operation. User trust that a system is safe enhances its adoption, and forces system design to be simple, user focused, and well tested.

The current PCEHR has multiple safety risks including:

  1. Using administrative data (e.g. PBS data and Prescribe/Dispense information) for clinical purposes (ascertaining current medications) – a use never intended;
  2. Using clinical documents (discharge summaries) instead of fine-grained patient data e.g. allergies. Ensuring data integrity is often not possible within documents (e.g. identifying contradicting, missing or out of date data);
  3. Together these create an electronic form of a hybrid record with no unitary view of the clinical ‘truth’. Hybrid records can lead to clinical error by impeding data search or by triggering incorrect decisions based on a partial view of the record [6];
  4. Shifting the onus for data integrity to a custodian GP avoids the PCEHR operator taking responsibility for data quality (a barrier to GP engagement and a risk because integrity requires sophisticated, often automated checking).
  5. No national process or standards to ensure that clinical software and updates (and indeed the PCEHR) are clinically safe.

The need for clinical safety to be managed within the PCEHR was fed into the PCEHR process formally [7], via internal NEHTA briefings, at public presentations at which PCEHR leadership were present and was clear from the academic literature. Indeed, a 2010 MJA editorial on the risks and benefits of likely PCEHR architectures highlighted recent evidence suggesting many approaches were problematic. It tongue-in-cheek suggested that perhaps GPs should ‘curate’ the record, only to then point out the risks with that approach [8].

Yet, at the beginning of 2012, no formal clinical safety governance arrangements existed for the PCEHR program. The notable exception was the Clinical Safety Unit within NEHTA, whose limited role was to examine the safety of standards as designed, but not as implemented. There was no process to ensure software connecting to the PCEHR was safe (in the sense that patients would not be harmed from the way information was entered, stored, retrieved or used), only that it interoperated technically. No ongoing safety incident monitoring or response function existed, beyond any internal processes the system operator might have had.

Concerns that insufficient attention was being paid to clinical safety prompted a 2012 MJA editorial on the need for national clinical safety governance both for the PCEHR as well as E-health more broadly [9]. In response, a clinical governance oversight committee was created within the Australian Commission on Safety and Quality in Health Care, (ACSQHC) to review PCEHR incidents monthly, but with no remit to look at clinical software that connects to the PCEHR. There is however no public record of how clinical incidents are determined, what incidents are reported, their risk levels or resulting harms, nor how they are made safe. A major lesson from patient safety is that open disclosure is essential to ensure patient and clinician trust in a system, and to maximize dissemination of lessons learned. This lack of transparency is likely a major barrier to uptake, especially given the sporadic media reports of errors in PCEHR data (such as incorrect medications) with the potential to lead to harm.

We recently reviewed governance arrangements for health IT safety internationally, and a wide variety of arrangements are possible from self-certification through to regulation [10]. The English NHS has a mature approach that ensures clinical software connecting to the national infrastructure complies with safety standards, closely monitors incidents and has a dedicated team to investigate and make safe any reports of near misses or actual harms.

Our recent awareness of large-scale events across national e-health systems – where potentially many thousands of patient records are affected at once – is another reason PCEHR and national e-health safety should be a priority. We recently completed, with the English NHS, an analysis of 850 of their incidents. 23% (191) of incidents were large-scale involving between 10 and 66,000 patients. Tracing all affected patients becomes difficult when dealing with a complex system composed of loosely interacting components, such as the PCEHR.


  1. A whole of system safety audit and risk assessment of the PCEHR and feeder systems should be conducted, using all internal data available, and made public. The risks of using administrative data for clinical purposes and the hybrid record structure need immediate assessment.
  2. A strong safety case for continued use of administrative data needs to be made or it should be withdrawn from the PCEHR.
  3. We need a whole of system (not just PCEHR) approach to designing and testing software (and updates) that are certifiably safe, to actively monitor for harm events, and a response function to investigate and make safe root causes of any event. Without this it is not possible for example to certify that a GP desktop system that interoperates with the PCEHR is built and operated safely when it uploads or downloads from the PCEHR.
  4. Existing PCEHR clinical safety governance functions need to be brought together in one place. The nature, size, structure, and degree to which this function is legislated to mandate safety is a discussion that must be had. Such bodies exist in other industries e.g. the civil aviation safety authority (CASA). ACSQHC is a possible home for this but would need to substantially change its mandate, resourcing, remit, and skill set.
  5. Reports of incidents and their remedies need to be made public in the same way that aviation incidents are reported. This will build trust amongst the public and clinicians, contribute to safer practice and design, and mitigate negative press when incidents invariable become public.


[See parent blog for links to papers that are not linked here]

1. Coiera E, Aarts J, Kulikowski C. The dangerous decade. Journal of the American Medical Informatics Association 2012;19:2-5

2. Patient safety problems associated with heathcare information technology: an analysis of adverse events reported to the US Food and Drug Administration. AMIA Annual Symposium Proceedings; 2011. American Medical Informatics Association.

3. Institute of Medicine. Health IT and Patient Safety: Building Safer Systems for Better Care. The National Academies Press: The National Academies Press., 2012.

4. Sparnon E, Marela W. The Role of the Electronic Health Record in Patient Safety Events. Pa Patient Saf Advis 2012;9(4):113-21

5. Coiera E, Westbrook J. Should clinical software be regulated? MJA 2006;184(12):600-01

6. Sparnon E. Spotlight on Electronic Health Record Errors: Paper or Electronic Hybrid Workflows. Pa Patient Saf Advis 2013(10):2

7. McIlwraith J, Magrabi F. Submission. Personally Controlled Electronic Health Record (PCEHR) System: Legislation Issues Paper 2011.

8. Coiera E. Do we need a national electronic summary care record. Med J Aust 2011 (online 9/11/2010);94(2):90-92

9. Coiera E, Kidd M, Haikerwal M. A call for national e-health clinical safety governance. Med J Aust 2012;196(7):430-31.

10. Magrabi F, Aarts J, Nohr C, et al. A comparative review of patient safety initiatives for national health information technology. International journal of medical informatics 2012;82(5):e139-48

Where Am I?

You are currently browsing the eHealth Safety category at The Guide to Health Informatics 3rd Edition.

%d bloggers like this: