Voices in AI – Episode 43: A Conversation with Markus Noga

In this episode, Byron and Markus discuss machine learning and automation.
[podcast_player name=”Episode 43: A Conversation with Markus Noga” artist=”Byron Reese” album=”Voices in AI” url=”https://voicesinai.s3.amazonaws.com/2018-05-22-(00-58-23)-markus-noga.mp3″ cover_art_url=”https://voicesinai.com/wp-content/uploads/2018/05/voices-headshot-card.jpg”]
Byron Reese: This is Voices In AI brought to you by GigaOm, I’m Byron Reese. Today, my guest is Marcus Noga. He’s the VP of Machine Learning over at SAP. He holds a Ph.D.in computer science from Karlsruhe Institute of Technology, and prior to that spent seven years over at Booz Allen Hamilton working on helping businesses adopt and transform their businesses through IT. Welcome to the show Markus.
Markus Noga: Thank you Byron and it’s a pleasure to be here today.
Let’s start off with a question I have yet to have two people answer the same way. What is artificial intelligence?
That’s a great one, and it’s sure something that few people can agree on. I think the textbook definition mostly defines that by analogy with human intelligence, and human intelligence is also notoriously tricky and hard to define. I define human intelligence as the ability to deal with the unknown and bring structure to the unstructured, and answer novel questions in a surprisingly resourceful and mindful way. Artificial intelligence in itself is the thing, rather more playfully, that is always three to five years out of reach. We love to focus on what can be done today—what we call machine learning and deep learning—that can draw a tremendous value for businesses and for individuals already today.
But, in what sense is it artificial? Is it artificial intelligence in the way artificial turf? Is it really turf, it just looks like it? Or is it just artificial in the sense that we made it? Or put another way, is artificial intelligence actually intelligent? Or does is it just behave intelligently?
You’re going very deep here into things like Searle’s Chinese room paradox about the guy in the room with a hand for definitions of how to transcribe Chinese symbols to have an intelligent conversation. The question being who or what is having the intelligent conversation. Is it the book? Certainly not. Is it the guy mindlessly transcribing these symbols? Certainly not? Is it maybe the system of the guy in the room, the book, and the room itself that generates these intelligent seeming responses? I guess I’m coming down on the output-oriented side here. I try not to think too hard about the inner states or qualia, or the question whether the neural networks we’re building have a sentient experience or the experience in this qualia. For me, what counts is whether we can solve real-world problems in a way that’s compatible with intelligence. Its place in intelligent behavior of everything else—I would leave to the philosophers Byron.
We’ll get to that part where we can talk about the effects of automation and what we can expect and all of that. But, don’t you think at some level, understanding that question, doesn’t it to some degree inform you as to what’s possible? What kinds of problems should we point this technology at? Or do you think it’s entirely academic that it has no real-world implications?
I think it’s extremely profound and it could unlock a whole new curve of value creation. It’s also something that, in dealing with real-world problems today, we may not have to answer—and this is maybe also something specific to our approach. You’ve seen all these studies that say that X percent of activities can be automated with today’s machine learning, and Y percent could be automated if there are better natural language speech processing capabilities and so on, and so forth. There’s such tremendous value to be had by going after all these low-hanging fruits and sort of doing applied engineering by bringing ML and deep learning into an application context. Then we can bide our time until there is a full answer to strong AI, and some of the deeper philosophical questions. But what is available now is already delivering tremendous value, and will continue to do so over the next three to five years. That’s my business hat on—what I focus on together with the teams that I’m working with. The other question is one that I find tremendously interesting for my weekend and unique conversations.
Let me ask you a different one. You started off by saying artificial intelligence, and you dealt with that in terms of human intelligence. When you’re thinking of a problem that you’re going to try to use machine intelligence to solve, are you inspired in any way by how the brain works or is that just a completely different way of doing it? Or do we learn how intelligence, with the capital I, works by studying the brain?
I think that’s the multi-level answer because clearly the architectures that do really well in analytic learning today are in a large degree neurally-inspired. Instead of having multi-layered deep networks—having them with a local connection structure, having them with these things we call convolutions that people use in computer vision, so successfully—it resembles closely some of the structures that you see in the visual cortex with vertical columns for example. There’s a strong argument for both these structures in the self-referential recurrent networks that people use a lot for video processing and text processing these days are very, very deeply morally inspired. On the other hand, we’re also seeing that a lot of the approaches that make ML very successful today are about as far from neutrally-inspired learning as you can get.
Example one, we struggled as a discipline with neutrally-inspired transfer functions—that were all nice, and biological, and smooth—and we couldn’t really train deep networks with them because they would saturate. One of the key enablers for modern deep learning was to step away from the biological analogy of smooth signals and go to something like the rectified linear unit, the ReLU function, as an activation, and that has been a key part in being able to train very deep networks. Another example when a human learns or an animal learns, we don’t tend to give them 15 million cleanly labored training examples, and expect them to go over these training examples 10times in a row to arrive at something. We’re much closer to one-shot learning and being able to recognize the person with a cylinder hat on their head just the basis of one description or one image that shows us something similar.
So clearly, the approaches that are most successful today are both sharing some deep neural inspiration as a basis, but, also a departure into computationally tractable, and very, very different kinds of implementations than the network that we see in our brains. I think that both of these themes are important in advancing the state-of-the-art in ML and there’s a lot going on. In areas like one-shot learning, for example, right now I’m trying to mimic more of the way the human brain—with an active working memory and these rich associations—is able to process new information, and there’s almost no resemblance to what convolutional networks and the current networks do today.
Let’s go with that example. If you take a small statue of a falcon, and you put it in a hundred photos—and sometimes it’s upside down, and sometimes it’s laying on its side, sometimes it’s half in water, sometimes it’s obscured, sometimes it’s in shadows—a person just goes “boom boom boom boom boom” and picks them out, right and left with no effort, you know, one-shot learning. What do you think a human is doing? It is an instance of some kind of transfer learning, but what do you think is really going on in the human brain, and how do you map that to computers? How do you deal with that?
This is an invitation to speculate on the topics of falcons, so let me try. I think that, clearly, our brains have built a representation of the real world around us, because we’re able to create that representation even though the visual and other sensory stimuli that reach us are not in fact as continuous as they seem. Standing in the room here having the conversation with you, my mind creates the illusion of a continuous space around me, but in fact, I’m getting distinct feedbacks from the eyes as they succumb and jump around the room. The illusion of a continuous presence, the continuous sharp resolution of the room is just that; it’s an illusion because our mind has built very, very effective mental models of the world around us, that’s highly contrasting information and make it tractable on an abstract level.
Some of the things that are going on in research right now [are] trying to exploit these notions, and trying to use a lot of unsupervised training with some very simple assumptions behind them; basically the mind doesn’t like to be surprised, and would, therefore, like to predict what’s next [by]leveraging very, very powerful unsupervised training approaches where you can use any kind of data that’s available, and you don’t need to enable it to come up with these unsupervised representation learning approaches. They seem to be very successful, and they’re beating a lot of the traditional approaches because you can have access to way larger corpuses of unlabeled information which means you can train better models.
Now is that it a direct analogy to what the human brain does? I don’t know. But certainly it’s an engineering strategy that results in world-leading performance on a number of very popular benchmarks right now, and it is, broadly speaking, neutrally-inspired. So, I guess bringing together what our brains do and what we can do in engineering is always a dance between the abstract inspiration that we can get from how biology works, and the very hard math and engineering in getting solutions to train on large-scale computers with hundreds of teraflops in compute capacity and large matrix multiplications in the middle. It’s advances on both sides of the house that make ML advance rapidly today.
Then take a similar problem, or tell me if this is a similar problem, when you’re doing voice recognition, and there’s somebody outside with the jackhammer, you know, it’s annoying, but a human can separate those two things. It can hear what you’re saying just fine, but for a machine, that’s a really difficult challenge. Now my question to you is, is that the same problem? Is it one trick humans have like that that we apply in a number of ways? Or is that a completely different thing that’s going on in that example?
I think it’s similar, and you’re hitting onto something because in the listening example there are some active and some passive components going on. We’re all familiar with the phenomenon of selective hearing when we’re at a dinner party, and there are 200 conversations going on in parallel. If we focus our attention on a certain speaker or a certain part of the conversation, we can make them stand out over the din and the noise because their own mind had some prior assumptions as to what constitutes a conversation, and we can exploit these priors in our minds in order to selectively listen in to parts of the conversation. This has partly a physical characteristic, maybe hearing in stereo. Our ears have certain directional characteristics to the way they pick up certain frequencies by turning our head the right way and inclining it the right way. We can do a lot already [with] stereo separation, whereas, if you have a single microphone—and that’s all the signal you get—all these avenues would be closed to you.
But, I think the main story is one about signals superimposed with noise—whether that’s camera distortions, or fog, or poor lighting in the case of the statue that we are trying to recognize, or whether it’s ambient noise or intermittent outages in the sense of the audio signal that you’re looking into. The two different most popular neutrally-inspired architectures on the market right now, [are] the convolutional networks for a lot of things in the image and also natural text space, and the recurrent networks for a lot of things in the audio ends at time series signal, but also on text space. Both share the characteristics that they are vastly more resilient to noise than any hard-coded or programmed approach. I guess the underlying problem is one that, five years ago, would have been considered probably unsolvable; where today with these modern techniques, we’re able to train models that can adequately deal with the challenges if the information is in the solid state.
Well, what do you think when the human hears, at a conversation at the party to go with that example, and you kind of like, “Oh, I want to listen to that.” I heard what you say that there’s one aspect of you where you make a physical modification to the situation, but what you’ve also done is introduced this idea of consciousness, that a person selectively can change their focus and that aspect of what the brain is doing, where it’s like, “Oh, wait a minute.” Maybe something that’s hard to implement on a machine, or is that not the case at all?
If you take that idea, and I think in the ML research and engineering communities this is currently most popular under the label of attention, or attention-based mechanisms, then certainly this is all over leading approaches right now—whether it’s the computer vision papers from CVPR just last week or whether it’s the text processing architectures that return state-of-the-art results right now. They all start to include some kind of attention mechanism allowing you to both weigh outputs by the center of attention, and also to trace back results to centers of extension, which have two very nice properties. On the one hand attention mechanisms, nascent as they are today, help improve the accuracy of what models can deliver. On the second hand, the ability to trace back on the outcome of a machine learning model to centers and regions of attention in the input can do wonders for explain-ability of ML and AI results, which is something that increasingly users and customers are looking for. Don’t just give me any result which is as good as my current process, or hopefully a couple of percentage points better. But, also helped me build confidence in this by explaining why things are being classed or categorized or translated or extracted the way they are. To gain the human trust into operating system of humans and machines working together explain-ability future is big.
One of the peculiar things to me, with regard to strong AI—general intelligence—is that there are folks who say, when you say, “When will we get a general intelligence, “the soonest you ever hear is five years. There are very famous people who believe we’re going to have something very soon. Then you get the other extreme is about 500 years and that worrying about that is like worrying about overpopulation on Mars. My question to you is why do you think that there’s such a wide range in terms of our idea of when we may make such a breakthrough?
I think it’s because of one vexing property of humans and machines is that the things that are easiest for us humans tend to be the things that are hardest for machines and vice versa. If you look at that today, nobody would dream of having computer as a job description. That’s a machine. If you think back 60-70 years, computer was the job description of people actually doing manual calculations. “Printer” was a job description, and a lot of other things that we would never dream of doing manually today were being done manually. Think of spreadsheets potentially the greatest simple invention in computing, think of databases, think of things like enterprise resource planning systems that SAP does, and business networks connecting them or any kind of cloud-based solutions—what they deliver is tremendous and it’s very easy for machines to do, but it tends to be the things that are very hard for humans. Now at the same time things that are very easy for humans to do, see a doggie and shout “doggie,” or see a cat and say “meow” is something that toddlers can do, but until very, very recently, the best and most sophisticated algorithms haven’t been able to do that part.
I think part of the excitement around ML and deep learning right now is that a lot of these things have fallen, and we’re seeing superhuman performance on image classification tasks. We’re seeing superhuman performance on things like switchboard voice-to-text transcription tasks, and many other elements are falling to machines that that used to be very easy for humans but are now impossible for us. This is something that generates a lot of excitement right now. I think where we have to be careful is [letting] this guide our expectations on the speed of progress in following years. Human intuition about what is easy and what is hard is traditionally a very, very poor guide to the ease of implementation with computers and with ML.
Example, my son was asking me yesterday, “Dad, how come the car can know where it is at and tell us where to drive?” And I was like, “Son, that’s fairly straightforward. There are all these satellites flying around, and they’re shouting at us, ‘It’s currently 2 o’clock and 30 seconds,’ and we’re just measuring the time between their shouts to figure out where we are today, and then that gives us that position on the planet. It’s not a great invention; it’s the GPS system—it’s mathematically super hard to do for a human with a slide rule; it’s very easy to do for the machine.” And my son said, “Yeah, but that’s not what I wanted to know. How come the machine is talking to us with the human voice? This is what I find amazing, and I would like to understand how that is built.” and I think that our intuition about what’s easy and what’s hard is historically a very poor guide for figuring out what the next step and the future of ML and artificial intelligence look like. This is why you’re getting those very broad bands of predictions.
Well do you think that the difference between the narrow or weak AI we have now and strong AI, is evolutionary? Are we on the path [where] when machines get somewhat faster, and we get more data, and we get better algorithms, that we’re going to gradually get a general intelligence? Or is a general intelligence something very different, like a whole different problem than the kinds of problems we’re working on today?
That’s a tough one. I think that taking the brain analogy; we’re today doing the equivalent of very simple sensory circuits which maybe can’t duplicate the first couple of dozens or maybe a hundred layers in the way the visual cortex works. We’re starting to make progress into some things like one-shot learning; it’s very nascent in that early-stage research right now. We’re starting to make much more progress in directions like reinforcement learning, but overall it’s very hard to say which if any additional mechanisms are there in the large. If you look at the biological system of the brain, there’s a molecular level that’s interesting. There’s a cellular level that’s interesting. There is a simple interconnection I know that’s interesting. There is a micro-interconnection level that’s interesting. I think we’re still far from a complete understanding of how the brain works. I think right now we have tremendous momentum and a very exciting trajectory with what our artificial neural networks can do, and at least for the next three to five years. There seems to be pretty much limitless potential to bring them out into real-world businesses, into real-world situations and contexts, and to create amazing new solutions. Do I think that really will deliver strong AI? I don’t know. I’m an agnostic, so I always fall back to the position that I don’t know enough.
Only one more question about strong AI and then let’s talk about the shorter-term future. The question is, human DNA converted to code is something like 700 MB, give or take. But the amount that’s uniquely human, compared to say a chimp or something like that is only about 1% difference—only 7 or 8 or 9 MB of code—is what gives us a general intelligence. Does that imply or at least tell us how to build something that then can become generally intelligent? Does that imply to you that general intelligence is actually simple, straightforward? That we can look at nature and say, it’s really a small amount of code, and therefore we really should be looking for simple, elegant solutions to general intelligence? Or do those two things just not map at all?
Certainly, what we’re seeing today is that deep learning approaches to problems like image classification, image object detection, image segmentation, video annotation, audio transcription—all these things tend to be orders of magnitude, smaller problems than what we dealt with when we handcrafted things. The core of most deep learning solutions to these things, if you really look at the core model on the model structure, tends to be maybe 500 lines of code, maybe 1000. And that’s within the reach of an individual putting this together over a weekend, so the huge democratization that deep learning based on big data lends is that actually a lot of these models that do amazing things are very, very small code artifacts. The weight matrices and the binary models that they generate then tend to be as large or larger than traditional programs compiled into executable, sometimes orders of magnitude larger again. The thing is, they are very hard to interpret, and we’re only at the beginning of an explain-ability of what the different weights and the different excitations mean. I think there are some nice early visualizations on this. There are also some nice visualizations that explain what’s going on with attention mechanisms in the artificial networks.
As to explain-ability of the real network in the brain, I think that is very nascent. I’ve seen some great papers and results on things like spatial representations in the visual cortex where surprisingly you find triangle scripts or attempts to reconstruct the image hitting the retina based on reading, with fMRI scans, the excitations in lower levels of the visual cortex. They show that we’re getting closer to understanding the first few layers. I think that even with the 7 MB difference or so that you allude to between chimps and humans spelled out for us, there is a whole set of layers of abstractions between the DNA code and the RNA representation, the protein representation, the excitation of these with methylation and other mechanisms that control activation of genes, and the interplay of the proteins across a living breathing human brain that all of this magnitude of complexity above of the super megabyte, by a certain megabyte difference in A’s and C’s, and T’s, and G’s. We live in super exciting types. We live in times were a new record, and a new development, and a new capability that was unthinkable of a year ago, or let alone a decade ago, is becoming commonplace, and it’s an invigorating and exciting time to be alive. I still struggle to make a prediction from the year to general AI based on a straight-line trend.
There’s some fear wrapped up though as exciting as AI is, there’s some fear wrapped up in it as well. The fear is the effect of automation on employment. I mean you know this, of course, it’s covered so much. There’s kind of three schools of thought: One says that we’re going to automate certain tasks and that there will be a group of individuals who do not have the training to add economic value. They will be pushed out of the labor market, and we’ll have perpetual unemployment, like a big depression that never goes away. Then there’s another group that says, “No, no, no, you don’t understand. Everybody is replaceable. Every single job we have, machines can do any of it.” And then there’s a third school about that says, “No, none of that’s going to happen. The history of 250 years of the Industrial Revolution is that people take these new technologies, even profound ones like electricity and engines, and steam, and they just use them to increase their own productivity and to drive wages up. We’re not going to have any unemployment from this, any permanent unemployment.” Which of those three camps, or a fourth, do you fall into?
I think that there’s a lot of historical precedent for how technology gets adopted, and there are also numbers of the adoption of technologies in our own day and age that sort of serve as reference points here. For example, one of the things that surprised me, truly, is the amount of e-commerce—as a percentage of overall retail market share—[that] is still in the mid to high single digit percentage points according to surveys that I’ve seen. That totally does not match my personal experience of basically doing all my non-grocery shopping entirely online. But it shows that in the 20-25 years of the Internet Revolution, a tremendous value has been created—and the conveniences of having all kinds of stuff at your doorstep with just a single click actually—that has transformed the single-digit percentage of the overall retail market with the transformation that we’ve seen. This was one of the most rapid uptakes in history of new technology that has groundbreaking value, by decoupling evidence and bits, and it’s been playing out over the past 20-25 years that all of us are observing.
So, I think while there is tremendous potential of machine learning in AI to drive another Industrial Revolution, we’re also in the middle of all these curves from other revolutions that are ongoing. We’ve had a mobile revolution that unshackled computers and gave everybody what used to be a supercomputer in their pocket which had an infinite revolution. Before that, we’ve had a client-server revolution and the computing revolution in its own—all of these building on prior revolutions like electricity, or the internal combustion engine, or methods like the printing press. They certainly have a tendency to show accelerating technology cycles. But on the other hand, for something like e-commerce or even mobile, the actual adoption speed has been one that is none too frightening. So for all the tremendous potential that ML and AI bring, I would be hard-pressed to come up with a completely disruptive scenario here. I think we are seeinga technology with tremendous potential for rapid adoption. We’re seeing the potential to both create new value and do new things, and to automate existing activities which continues past trends. Nobody has computer or printer as their job description today, and job descriptions like social-media influencer, or blogger, or web designer did not exist 25 years ago. This is an evolution on a Schumpeterian creative destruction that is going on all over industry, in every industry, in every geography, based on every new technology curve that comes in here.
I would say fears in this space are greatly overblown today. But fear is real the moment you feel it, therefore institutions—like The Partnership on Artificial Intelligence, with the leading technology companies, as well as the leading NGOs, think tanks, and research institutes—are coming together to discuss the implications of AI, and ethics of AI, and safety and guiding principles. All of these things are tremendously important to make sure that we can adopt this technology with confidence. Just remember that when cars were new, Great Britain had a law that a person with a red flag had to walk in front of the car in order to warn all pedestrians of the danger that was approaching. That was certainly an instance of fear about technology, that, on the one hand, was real at that point in time, but that also went away with a better understanding of how it works and of the tremendous value on the economy.
What do you think of these efforts to require that when an artificial intelligence makes a ruling or a decision about you that you have a right to know why it made that decision? Is that a manifestation of the red flag in front of the car as well, and is that something that would, if that became the norm, actually constrain the development of artificial intelligence?
I think you’re referring to the implicit right to explanation on this part of the European Union privacy novella for 2018. Let me start by saying that the privacy novella we’re seeing is a tremendous step forward because the simple act of harmonizing the rules and creating one digital playing field across the hundreds of millions of European citizens, and countries, and nationalities, is a tremendous step forward. We used to have one different data protection regime for each federal state in Germany, so anything that is required and harmonized is a huge step forward. I also think that the quest for an explanation is something that is very human. At the core of us is to continue to ask “why” and “how.” That is something that is innate to ourselves when we apply for a job with the company, and we get rejected. We want to know why. And when we apply for a mortgage and we can offer a rate that seems high to us and we want to understand why. That’s a natural question, it’s a human question, and it’s an information need that needs to be served if we don’t want to end up in a Kafka-esque future where people don’t have a say about their destiny. Certainly, that is hugely important on the one hand.
On the other hand, we also need to be sure that we don’t measure ML and AI to a stricter standard than we measure humans today because that could become an inhibitor to innovation. So, if you ask a company, “Why you didn’t get accepted for that job offer?” They will probably say, “Dear Sir or Madam, thank you for your letter. Due to the unusually strong field of candidates for this particular posting, we regret to inform you that certain others are stronger, and we wish you all the best for your continued professional future.” This is what almost every rejection letter reads like today. Are we asking the same kind of explain-ability from an AI system that is delivering a recommendation today that we apply to a system of humans and computers working together to create a letter like that? Or are we holding them to a much, much higher standard? If it is the first thing, absolutely essential. If it’s the second thing, we got to watch whether we’re throwing out the baby with the bathwater on this one. This is something where we, I think, need to work together to find the appropriate levels and standards for things like explain-ability in AI to fill very abstract sentences like write to an explanation with life that can be implemented, that can be delivered, and that can provide satisfactory answers at the same time while not unduly inhibiting progress. This is something that, with a lot of players focused on explain-ability today, where we will certainly see significant advances going forward.
If you’re a business owner, and you read all of this stuff about artificial intelligence, and neural nets, and machine learning, and you say, “I want to apply some of this great technology in my company,” how do people spot problems in a business that might be good candidates for an AI solution?
I can extort that and turn it around by asking, “What’s keeping you awake at night? What are the three big things that make you worried? What are the things that make up the largest part of your uncertainty, or of your cost structure, or of the value that you’re trying to create?” Looking on end-to-end processes, it’s usually fairly straightforward to identify cases where AI and ML might be able to help and to deliver tremendous value. The use-case identification tends to be the fairly easiest chord of the game. Where it gets tricky is in selecting and prioritizing these cases, figuring out the right things to build, and finding the data that you need in order to make the solution real, because unlike traditional software engineering, this is about learning from data. Without data, you basically can’t sort or at least we have to build some very small simulators in order to create the data that you’re looking for.
You mentioned that that’s the beginning of the game, but what makes the news all the time is when AI beats a person at a game. In 1997 you had chess, then you had Ken Jennings in Jeopardy!, then you had AlphaGo and Lee Sedol, and you had AI beating poker. Is that a valid approach to say, “Look around your business and look for things that look like games?” Because games have constrained rules, and they have points, and winners, and losers. Is that a useful way to think about it? Or are the game things more like AI’s publicity, a PR campaign, and that’s not really a useful metaphor for business problems?
I think that these very publicized showcases are extremely important to raise awareness and to demonstrate stunning new capabilities. What we see in building business solutions is that I don’t necessarily have to be the human world champion in something in order to deliver value. Because a lot of business is about processes, is about people following flowcharts together with software systems trying to deliver a repeatable process for things like customer service, or IT incident handling, or incoming invoice screening and matching, or other repetitive recurring tasks in the enterprise. And already by addressing—it’d be easy to serve 60-80% of these, we can create tremendous value for enterprises by making processes run faster, by making people more productive, and by relieving them of the parts of activities that they regard as repetitive and mind-numbing, and not particularly enjoyable.
The good thing is that in a modern enterprise today, people tend to have IT systems in place where all these activities leave a digital exhaust stream of data, and locking into that digital exhaust stream and learning from it is the key way to make ML solutions for the enterprise feasible today. This is one of the things where I’m really proud to be working for SAP because 76% of all business transactions, as measured by value, anywhere on the globe, are on an SAP system today. So if you want to learn models on digital information that touch the enterprise, chances are it’s either in an SAP system or in a surrounding system already today. Looking for these and sort of doing the intersection between what’s attractive—because I can serve core business processes with faster speed, greater agility, lower cost, more flexibility, or bigger value—and crossing that with the feasibility aspect of “do I have the digital information that I can learn from to build business-relevant functionality today?,” is our overriding approach to identifying things that we built in order to make all our SAP enterprise applications intelligent.
Let’s talk about that for a minute. What sorts of things are you working on right now? What sorts of things have the organization’s attention in machine learning?
It’s really end-to-end digital intelligence on processes, and let me give you an example. If you look at the finance space, which SAP is well-known for, these huge end-to-end processes—like record to report, or things like invoice to record—which really deal end-to-end with what an enterprise needs to do in order to buy stuff and pay for it, and receive it, or to sell stuff, and get paid for it. These are huge machines with dozens and dozens of process steps, and many individuals in shared service environments that otherwise perform the delivering of these services. They see a document like an invoice, for example, it’s just the tip of the iceberg for a complex orchestration and things to deal with that. We’re taking these end-to-end processes, and we’re making them intelligent every step of the way.
When an invoice hits the enterprise, the first question is what’s in it? And today most of the units in shared service environments extract development information via SAP systems. The next question is, “Do I know this supplier?” If they have merged or changed names or opened a new branch, I might not have them in my database. That’s a fuzzy lookup. The next step might be, “Have I ordered something like this?” and that’s a significant question because in some industries up to one-third of spending actually doesn’t have a purchase order. Finding people who have an order of this stuff, all related stuff from this supplier, or similar suppliers in the past, can be the key to figuring out whether we should approve it or not. Then, there’s the question of, “Did we receive the goods and services that this invoice is for?” That’s about going through lists and lists of staff, and figuring out whether the bill of lading for the truck that arrived really contains all the things that were on the truck and all the things that were on the invoice, but no other things. That’s about list matching and list comprehensing, and document matching, and recommending classification systems. It goes on and on like that until the point where we actually put through their payment, and the supplier gets paid for the first invoice that was there.
What you see is a digital process that is enabled by IT systems, very sophisticated IT systems, routine workflows between many human participants today. What you do is we can take the digital exhaust of all the process participants to learn what they’ve been doing, and then put the common, the repetitive, the mind-numbing part of the process on autopilot—gaining speed, reducing cost, making people more satisfied with their work day, because they can focus on the challenging, and the interesting, and the stimulating cases, and increasing customer satisfaction, or in this case supplier satisfaction because they get paid faster. This end-to-end approach is how we look at business processes, and when my ML group and AI do that, we see an order recommender, an entity extractor or some kind of translation mechanism at every step of the process. We work hard to turn these capabilities into scalable APIs on our cloud platform that integrates seamlessly with these standard applications, and that’s really our approach to problem-solving. It ties to the underlying data repository about how business operates and how processes slow.
Did you find that your customers are clear with how this technology can be used, and they’re coming to you and saying, “We want this kind of functionality, and we want to apply it this way,” and they’re very clear about their goals and objectives? Or are you finding that people are still finding their sea legs and figuring out ways to apply artificial intelligence in the business, and you’re more heading to lead them and say, “Here’s a great thing you could do that you maybe didn’t know was possible?”
I think it’s like everywhere, you’ve got early adopters, and innovation promoters, and dealers who actively come with these cases of their own. You have more conservative enterprises looking to see how things play out and what the results for early adopters are. You have others who have legitimate reasons to focus on burning parts of their house right now, for whom this, right now is not yet a priority. What I can say is that the amount of interest in ML and AI that we’re seeing from customers and partners is tremendous and almost unprecedented, because they all see the potential to tag business processes and the way business executes to a complete new level. The key challenge is working with customers early enough, and at the same time working with enough customers in a given setting to make sure that this is not a one-off that is highly specific, and to make sure that we’re really rethinking the process with digital intelligence instead of simply automating the status quo. I think this is maybe the biggest risk. We have tremendous opportunity to transform how business is done today if we truly see this through end-to-end and if we are looking to build out the robots. If we’re only trying to build isolated instances of faster horses, the value won’t be there. This is why we take such an active interest in the end-to-end and integration perspective.
Alright well, I guess just to two final questions. The first is, overall it sounds like you’re optimistic about the transformative power of artificial intelligence and what it can do—
Absolutely Byron.
But I would put that question to you that you put to businesses. What keeps you awake at night? What are the three things that worry you? They don’t have to be big things, but what are the challenges right now that you’re facing or thinking about like, “Oh, I just wish I had better data or if we could just solve this one problem?”
I think the biggest thing keeping me awake right now is the luxury problem of being able to grow as fast as demand and the market wants us to. That has all the aspects of organizational scaling and scaling the product portfolio that we enable with intelligence. Fortunately, we’re not a small start-up with limited resource. We are the leading enterprise software company and scaling inside such an environment is substantially easier than it would be on the outside. Still, we’ve been doubling every year, and we look set to continue in that vein. That’s certainly the biggest strain and the biggest worry that I face. It’s very old-fashioned things; it’s like leadership development that I tend to focus a lot of my time on. I wish I would have more time to play with models, and to play with the technology and to actually build and ship a great product. What keeps me awake is these more old-fashioned things, one of leadership development that matter the most for where we are at right now.
You talked at the very beginning, you said that during the week you’re all about applying these technologies to businesses, and then on the weekend you think about some of these fun problems? I’m curious if you consume science fiction like books or movies, or TV, and if so, is there any view of the future, anything you’ve read or seen or experienced that you think, “Ah, I could see that happening.” Or, “Wow, that really made me think.” Or do you not consume science fiction?
Byron, you caught me out here. The last thing I consumed was actually Valerian and the City of a Thousand Planets just last night in the movie theater in Karlsruhe that I went to all the time when I was a student. While not per se occupied with artificial intelligence, it was certainly stunning, and I do consume a lot of the stuff from the ease of it. It provides a view of plausible futures. Most of the things I tend to read are more focused on things like space, oddly enough. So things like The Three-Body Problem, and the fantastic trilogy that that became, really aroused my interest, and really made me think. There are others that offer very credible trajectories. I was a big fan of the book called Accelerando, which paints a credible trajectory from today’s world of information technology to an upload culture of digital minds and humans colonizing the solar system and beyond. I think that these escapes are critical to cure the hem from day-to-day business, and the pressures of delivering product under a given budget and deadlines. Sort of indulging in them, allows me to return relaxed, and refreshed, and energized on every Monday morning.
Alright, well that’s a great place to leave it, Markus. I’m want to thank you so much for your time. It sounds like you’re doing fantastically interesting work, and I wish you the best.
Did I mention that we’re hiring? There’s a lot of fantastically interesting work here, and we would love to have more people engaging in it. Thank you, Byron.
Byron explores issues around artificial intelligence and conscious computers in his new book The Fourth Age: Smart Robots, Conscious Computers, and the Future of Humanity.