Related content

AI Development: It’s Just Science (with bells on it)

Science & AI Development

People I speak to within the broader science community are often shocked that they can understand what’s involved in the development of artificial intelligence (AI) products. Several times, they respond, “You mean like in real science?”

In any field of science, we are taught scientific principles and processes. The research methods which relate to these apply consistency to our search to understand, explain or model the real world on the one hand. And on the other hand, have repeatability as a cornerstone of reporting our results. And AI development? Yes! Shockingly (or not so shockingly), the process for developing an efficacious AI, that meets standards and can be regulated (for example, for use in a medical setting), follows the same route. Why would we bother being rigorous?

two young girls in a science lab

Because AI developed with scientific rigour is worth developing further, will be far more likely to make it into the real world, and the efforts of clinician researchers will ultimately be of more value for patient care.

For developing useful algorithms for healthcare, science expects a clear explanation of: the processes used to source the data (e.g. medical images); the ethics and permissions associated with the patient data; and, how the quality and validity of the dataset is assured. But this is also where the expectation of later proving fairness – i.e. lack of bias – in AI development starts being met. A good dataset is one in which statistical and other methods can be applied to examine and explain the biases in the dataset. There should also be a sufficient quantity and quality of data. And anonymisation should not remove access to the metadata. Good science at the start means that the AI that is developed down the line can be shown to be fair.

Just like in other sciences where the aim is developing products – rather than just researching – the early phases of the machine learning model development must take account of the requirements of their intended patient population. The intended patients for the AI will have the same diseases, however the subgroups will be diverse in multiple ways: by race/ethnicity, income, age, to name a few (FDA, January 2021). Bias – favouritism towards some things, people or groups over others – is a hurdle that has become uniquely important for trustworthy AI development. Bias must be identified and eliminated at the model development phase if the clinician researcher wants to reassure potential investors that harm will not occur down the line from using a biassed AI in decision making.

Not addressing bias early on at the scientific level introduces a barrier to software development funding, and to developing convincing proofs that the tool works everywhere it needs to without favouring one group over another. Therefore, a robust methodological framework for algorithmic development is needed.

Imagine that a team of researchers is building a radically new wound closure… and they lose the list of ingredients while working on this in the lab. The exemplar wound cover can be shown to be self-cauterising – amazingly strong and light! But without the method, nobody can make more and there is no way to assure regulators that there are no problematic ingredients like, for example, latex. And the method of making them cannot be patented.

As you can now see, much of AI development relies upon the same basic processes found in “real” science. This includes that it should respect the principles of reliability, validity and repeatability. That is, the AI developers should be able to hold up their work, describe in detail the data which was used to develop the model, thereby demonstrating the validity and the quality of the AI.

In the following hypothetical scenario an AI is developed using imaging data comprising MRI scans of bone joints. The AI tool is expected to support diagnosis of arthritis. Those who produced the AI are questioned by a paediatrician who wants to know whether this AI could be used to help to diagnose child patients too. A fair question to ask would be, “Did the imaging data used to develop this AI include MRI scans from children, or only adults?” Reasonably, we should expect the AI developers to answer such questions with confidence. If they employed suitable processes and supporting software to curate their datasets, then performing checks should not be a problem.

What about those extra bells?

Let’s take the ANSI standard for AI in healthcare, developed by the US Consumer Technology Association (CTA) Artificial Intelligence working group.

Within this are three sections on trust. Human Trust and Regulatory Trust are about how the Software as a Medical Device (SaMD) performs. Whereas Technical Trust, which is the most detailed, is much more specifically about how the AI is developed. The Technical Trust requirements show point-by-point that an SaMD with AI needs to be developed from a high-quality dataset which has been assembled with scientific rigour. This serves to demonstrate the main difference between standards designed for SaMD generally and for SaMD with AI.

young boy in a science class room

The Technical Trust requirements can mostly be satisfied only when the dataset is being assembled and curated, and are specific to artificial intelligence. In summary, they state that:

  • An AI must be fair, which not only means minimising the level of bias inherent in the dataset but also that which can be caused by combining or joining datasets.
  • Data security and privacy should be GDPR compliant. From the moment that personal health information is collected, used, stored and managed, the information that is collected must be appropriate and the AI developer should be able to provide assurance for these requirements.
  • The data used to train the model should be high-quality, relevant and reliable so it would be good to include whether any independent third party has assessed the data for integrity and trustworthiness. And, there should be sufficient detail that allows others to repeat the work, for validation or further development.


These additional requirements relate directly to AI development that is trustworthy and ethical. As this ultimately increases adoption of AI applications in medicine, so that both patients and clinicians can benefit from this technology, they are paramount. 

Choosing AI Development Tools

The domain expert, model developer, and the owner of the AI solution will need a secure, encrypted, scalable and ethical platform to build their high-quality datasets if they aim to develop an AI that meets standards and can be accepted by regulators for real-world use. Such as the Machine Learning Operations (MLOps) platform built by us at This platform is designed directly to meet the needs of AI developers in healthcare, biomedicine and pharmaceuticals. What’s more, it enables model developers to return to any version of their datasets and inspect them securely, as and when required.

Reliable and Repeatable Deployment

If the development of AI follows conventional scientific methods, so should the deployment and maintenance follow well understood processes too. The production and delivery of other highly regulated products, be they drugs, electronics in hazardous areas or other “risky” products, are all managed through the use of agreed international standards and legal regulation regimes which engender trust. 

The regulatory frameworks for the use of AI in high-risk environments are only now emerging but the development of new agreed standards and the repurposing of existing standards, for Quality Systems, Data Security, Coding  that already have been shown to be effective, will also be the cornerstone of future trustworthy AI.

The reality is that if AI development is just science then AI deployment is just Engineering.

The process of high-quality AI development does not need to be esoteric – and neither should it be. It’s just science. With bells on it.

Images: Courtesy of Mart Production.