http://sloanreview.mit.edu/article/is-your-organization-ready-for-the-impending-flood-of-data/
(…)
Q. You’ve said that Internet devices that interact with the physical world will soon be the norm — that ubiquitous, constantly connected device will learn on their own, with some verbal instruction by their users. How do you see organizations becoming able to take advantage of these devices?
A. There are a lot of processes where there are sensors, or what I call computer-mediated transactions: you have a computer in the middle of the transaction that can capture a wealth of information about the transaction.
As you know, for instance, GE has a lab on the industrial Internet, and what they’re trying to do is to improve their system monitoring for big devices like airplanes, electricity generators and so on.
And if you look at just mobile phones, they capture a huge amount of information that can be used for navigation and for reminding people to do things and for scheduling. The voice recognition — at least on the Android side — is so good now that it’s quite feasible to give almost all those instructions verbally. And we think, or I think, that the most natural interface for all those things is a verbal one: you speak to your house, you speak to your phone, you speak to your car. And I think we’ll find that is the norm for most kinds of activities of that sort.
Q. Once these devices are collecting data, what about the ability to take that data and then to use it, process it, extract it, visualize and communicate it? What do organizations need to be doing to be able to do this?
Well, the challenge for most big companies is that they grow by acquisition. And so you end up with several separate data systems that don’t communicate easily.
Google has tremendous discipline in this respect. Basically, when we acquire a company, we integrate its software into our way of doing things. What’s great about that is you can take an engineer from one project on Google, move them to another project at Google, and they are productive pretty much immediately, because everyone is using the same conventions and the same coding style, the same basic blocks for storing and accessing data.
But Google is virtually unique in that respect. There are very few other companies that are able to do that. And because you have that integrated system, then it’s much easier to access data and use the tools that we’ve developed to do this kind of analytics. So, for example, creating a dashboard for some process. You’ve created some new system, you want to monitor it; you create a dashboard to display that monitor. That’s a half-hour implementation at Google because of these great tools that we’ve developed.
Q. How is Google able to do this where others can’t?
It’s very costly, because when you do an acquisition, you bring somebody in, you’ve got to basically redo their system to align with Google’s.
On the other hand, Google’s system of doing things is good. It’s evolved over many years. It’s usually a step up over what people have when they come in. And that’s because Google is a company that’s run by engineers. It was founded by engineers. Larry Page, Sergey Brin and Eric Schmidt, they all have essentially PhDs in computer engineering, and so they were willing to spend the money to make this standardization happen across the company.
Q. And I guess it’s a transition between upfront costs versus paying later, with every dashboard, in your example, versus paying up front one time. I used to have a software company, and one of the things that used to drive me nuts was people saying, “Oh, yeah, we’ll just do it this way for now.” I’ve never seen a “for now” that didn’t turn into a “forever.”
A. Yep.
Q. You mentioned that a lot of commonly used data analytics techniques don’t really apply to datasets with millions of observations. We came from a mindset of dealing with 100 or so observations, and so we have some, I think “bad habits,” are what you called it. Can you give some examples of some organizational bad habits? And how do organizations unlearn some of these bad habits?
A. For example, in one case, we’ve seen organizations where they’ve had to monitor a lot of data. They’ll build the system, as you said, “for now”; it’ll handle 90 days of data or something like that. And that’s really bad, because so much data is highly seasonal. In consumer data, there are holiday effects, there are weather effects, there are all sorts of things going on, and you just can’t do anything much with 90 days of data. So you have to have at least two years of data to really get the seasonality right in a lot of cases.
So a lot of these companies, they’re thinking too small or they’re just doing something for the moment. You’ve got to plan for much bigger, if you really want to provide high-quality service to your user base.
Another thing we do — this isn’t exactly a managerial issue — but when you look at econometricians and statisticians and so on, they’re used to working with relatively small amounts of data. They’ll do a lot of in-sample forecasting and things like that — see, this regression fits really well. But when you talk to a person who’s used to working with large amounts of data, they’re always going to do out-of-sample forecasting, out-of-sample predicting, because you get a much more realistic estimate of what it is you’re trying to predict.
So at Google, we have two groups, the statisticians and the machine-learning people — and there’s some overlap in the groups, but I have to say, I think we’ve learned a lot from each other in terms of how to deal with these massive datasets.
Oh, and the one criticism — I will mention, the one thing that the machine-learning guys are not used to doing is taking samples. Because they want to work with a trillion observations, when it might be just as good to take a 5% sample. They find it challenging. And of course in production work, when you’re really doing the production, you may have to be able to deal with data that size. But when you’re doing the analytics, I’ve found that doing sampling is fine for lots of things.
Q. So, how do we get the new generation of managers to understand the data that’s available, and what could be done with it?
A. Well, there is this problem of getting the data from your point of sale or from your devices, from your customers, into the cloud. So you’ve got to set up that pipeline. And that can be pretty challenging, integrating the systems. But there’s enough commonality at this point that it’s much, much more straightforward than it once was.
So now you’ve got the data available in some data warehouse configuration, and then the question is, how do I access it? How do I input it in decisions?
How do I utilize that data effectively? That’s where people are now. They say, “Let’s go hire a data scientist or some statisticians. Let’s go hire some data engineers.” And they find out everybody else is trying to hire the same people.
The bottleneck ends up being, in many cases, finding those skilled data scientists to hire. Now, I will say, universities have been very good at creating programs to educate people in this area. At Berkeley, at Stanford, at many other places around the country, they’ve kind of jumped into this — into providing such programs. So I think this shortage is going to be alleviated in a few years.
Q. I’m not going to ask you to reveal the next cool Google thing that’s coming out that we don’t know about, but are there any initiatives that you’re pretty excited about at Google right now?
A. One of the things that I think we’ve been working on and we’re excited about internally and it’s getting a lot of external attention as well is Google
Now, which is on the Android phones, and I think there’s also a version for iOS now. That’s basically a personal digital assistant. You mostly interact with Google Now through speech.
And it’s just like having an outstanding administrator who’s watching out for you and reminding you where you parked the car and your calendar appointments and what the weather’s going to be like on that trip you’re taking tomorrow, and on and on and on. It’s just proactive in answering questions.
Proactive is an interesting word choice.
Larry Page used to say, “The trouble with Google is that you have to ask it a question. You shouldn’t have to ask it a question; it should just give you the answer.” And that’s what Google Now does. It gives you the answer.
That’s a pretty exciting development. And by the way, I think that’s going to be a big area of competition in the industry, because of course, Apple has Siri, and Microsoft has Cortana, and they’re all going to be competing in providing these personal digital assistant capabilities.
Q. You’re so pro-voice. On a personal level, there’s nothing more annoying to me when I call in somewhere and get into the phone tree that wants me to speak, then it doesn’t understand my beautiful Southern accent.
A. Yes, well, the nice thing about Google Now is that the voice recognition can be personalized to you. So it is much more accurate than the generic systems.
How you interact is a choice that you make, but you can also do interaction via the phone screen or via your computer. I think we’re going to move to this more natural way of just asking your house to turn on the lights or open the garage door, or — these sorts of things. It’s a more natural way to communicate.
Q. I assume that the garage door, then, won’t start spouting off how much RAM it has and what version of the BIOS it’s using. I think that’s the other step we’ve got to get through too, where, when you ask the light to turn on, that it just turns on, rather than giving you a lot of details about its wattage.
A. Well, maybe if you’re lonely and need someone to talk to, you could talk to your garage door….