Speech recognition has been on the brink of major success for decades, so it feels. Rather than set of generic “when will it be mainstream” questions, I was keen to catch up with Martin Held, Senior Product Manager, Healthcare at Nuance, to find out how things stood in this, specific and highly relevant context.
- How do you see the potential for speech recognition in the healthcare sector?
Right now, the most gain will be from general documentation, enabling people to dictate instead of type, to get text out faster. In some areas of healthcare, things are pretty structured – you have to fill forms electronically, with drop-down lists and so on. That’s not a primary application for speech, but anything that requires free text, there’s no comparison or alternative. Areas where handwritten notes are put into notes fields, that’s a good application. Discharge notes can be also be very wordy.
From a use case perspective, we’ve done analysis on how much time teams are spending on documentation and it’s huge — three quarters of medical practices are spending half of their time on documentation alone. In the South Tees Emergency department, we did a study where use of speech recognition reduced documentation time by 40%. In another study with Dukinfield, a smaller practice, by introducing our technology they were able to see 4 more patients (about a 10% increase) per day.
- What has happened over the past 5 years in terms of performance improvements and innovation?
In these scenarios, it’s a question of “can it work, can it perform” across a range of input devices. General speech recognition has improved so much that we are in the upper 90% range straight out of the gate. Now none of our products require training, based on new technology that was introduced using deep neural networks and machine learning.
In healthcare, we have also added cloud computing and changed the architecture: we put a lightweight client on the end-point machine or device, which streams audio to a back-end recognition server hosted in Microsoft Azure. We announced recently the general availability of Dragon Medical One — cloud-based recognition.
Still connectivity is a big issue, in particular for mobile situations, such as a community nurse — it’s not always possible to use recognition back in the car, if a mobile signal is poor for example. We are looking at technology that could record, then transcribe later.
- How have you addressed the privacy and risk implications?
We are certified to connect to N3 network, allowing NHS entities to connect according to requirements around governance and privacy, for example patient confidentiality. Offering a service through the NHS N3 network requires an Information Governance Statement of Compliance and submission of IG Toolkit through NHS Digital — this involves a relatively long and detailed certification process, including disaster recovery, Nuance internal processes and practices, employees with access and so on.
We are also offering input via the public Internet, as encryption and other technologies are secure so customers can connect through these means. So, for example, we can use mobile phones as an input device. We are not trying to build mobile medical devices, we know how difficult that is, but we are looking to replace the keyboard (which is not a medical device!)
As a matter of best practice, it is still required that the doctor has to sign the discharge or confirm an entry in electronic medical record system, whether it has been typed or dictated. So generated text is always a reference: and that will need to stay there. It’s more than five years before the computer can be seen as taking this responsibility from the doctor. Advice similarly can only be guidance.
- How do you see the market need for speech recognition maturing in healthcare?
Right now we’re still very much in an enablement situation with our customers, helping with their documentation needs. From a recognition perspective we can see the potential of moving from enablement to augmentation, making it simpler and hands-free, moving to more of a virtual assistant approach for a single person. In the longer-term, further out, we have the potential to do that for multiple people at the same time, for example a clinician, parent and child.
We’re also looking at the coding side of things — categorising disease, treatment, length of stay and so on from patient documentation. Codes are used for multiple things – reimbursement with insurance, negotiation between GPs, primary and secondary care about services to provide in future, with commissioner and trust to negotiate on payment levels. For primary care, doctors do coding but in secondary care, it’s done by a coder looking through a record after the discharge of a patient. If data is incomplete or non-specific, trusts can miss out on funding. Nuance already offers Natural Language Understanding based-coding products in the US, and these are being evaluated for the specifics of the healthcare market in the UK.
So we want to help turn documentation into something that can be easily analysed. Our technology cannot just recognise what you say, but in natural language understanding we can analyse the text and match against codes, potentially opening the door to offering prompts. For example, if doctor diagnoses a COPD, the clinician may need to ask if patient is a smoker, which will have a consequence in the code.
- How does Nuance see the next 5 years panning out, in terms of measuring success for speech recognition?
We believe speech recognition is ready to deliver a great deal of benefit to healthcare, gaining efficiency and freeing up clinical staff. In terms of the future, we recently showed a prototype of a virtual assistant that combines a lot of technologies, including biometrics, complete speech control, text analysis and meaning extraction, and also appropriate selection — so the machine can distinguish between a command and whether I just wanted to say something.
This combination should make the reaction a lot more human — we call this conversational artificial intelligence. Another part of this is about making text to speech as human as possible. Then combining that with cameras and microphones in the environment, for example pointing at something and saying, give me more information about ‘this’. That’s all longer term, but the virtual assistant and video are things we are working on.
My take: healthcare needs all the help it can get
So, does speech recognition have a place? Over the past couple of decades of use, we have learned that we generally do not like talking into thin air, and particularly not to a computer: the main change over recent years, the reduction in training time, has done little to reduce this very psychological blocker, which means that speech recognition remains in a highly useful, yet relatively limited niche of auto-transcription.
Turning specifically to the healthcare industry, a victim of its own science-led success: it is difficult to think of an industry vertical in which staff efficiency is more important. In every geography, potential improvements to patient outcomes are being stymied by a lack of funds, symptomized by waiting lists, bed shortages and so on, while being burdened by the weight of ever-increasing bureaucracy.
Even if speech recognition could knock one or two percentage points off the time taken to execute a clinical pathway, the overall savings could be massive. Greater efficiency also opens the door to higher potential quality, as clinicians can focus on ‘the job’ rather than the paperwork.
For the future, use of speech recognition beyond the note-taking this also links to the potential for improved diagnosis, through augmented decision making, and indeed, improved patient safety as technology provides more support to what is still a highly manual industry. This will take time, but our general habits are changing as the likes of Alexa and Siri make us more comfortable about talking to inanimate objects.
Overall, progress may be slow for speech recognition particularly in healthcare, but it is heading in the right direction. One day, our lives might depend on it.