BigInsights interview with Mike Olson, Chief Strategy Office and Chairman of the Board, Cloudera

Datacon extends support to inaugural BigData and BI Excellence Awards

August 28, 2013

‘Big Data Vendor Landscape Study’ launched at DataCon 2013

October 3, 2013

Cloudera aims to hit US $ 1 billion not only in market cap but also in revenue, says Cloudera’s Mike Olson

Those of you involved in Big Data, will be familiar with Cloudera, a leading brand in the Apache Hadoop-based software and services space.

But for others out there who don’t, here’s a quick description: Established in 2008, the Palo Alto, California, US-based Cloudera offers an integrated Big Data platform comprising software, support, training and professional services. This platform has open source Apache Hadoop software at its core and includes additional value-added software for enterprises to deploy and use the open source platform on critical business problems that they can attack with big data. Cloudera allows customers to store, process and analyse data reliably, securely and inexpensively, offering a data platform that enables enterprises and organisations to look at all their data — structured as well as unstructured.

The BigInsights Team lead by Chief Executive (CEO) Raj Dalal, Partner Haima Prakash and Chief Technology Officer (CTO) David Triggs caught up with Cloudera’s co-founder, Chairman and Chief Strategy Officer, Mike Olson on his recent visit to Sydney, Australia as part of BigInsights, Big Data Vendor Landscape Study. Over lunch, Mike delved in on several topics around Big Data, including Cloudera’s present and future strategies.

In the coming weeks, BigInsights will be bringing you a three-part series article based on excerpts from this extended talk between the BigInsights’ team and Mike.

Cloudera's Vision For The Future

Cloudera’s commitment to an open source data platform is absolute, but it will continue to innovate on top for administration, and even for business value over the long term, said Mike Olson, CSO of Cloudera.

“We believe we've got an opportunity not only to hit a billion dollars in market cap, but to cross through a billion dollars in revenue and that only happens if we're delivering serious value to customers and they keep coming back.

“We want a differentiated product set that allows us to generate the revenue that permits us to invest back in the open source platform. The collection of late market entrants, and especially the larger companies that have stepped in, don't have the representation in the open source community to really drive (on) that road map. Some of the emerging venture backed companies don't have IP of their own,” he said.

Here are snapshots of some of the answers that Mike gave to questions posed by the BigInsights’ team:

Cloudera’s differentiation and leadership strategy in both direct and OEM markets worldwide?

Mike:

Our job is to deliver customer success. Storage, data processing, data analytics, that's fundamentally what the CIOs want to be open source. They're afraid of bad vendor behavior from decades of experience and they want to know that the substrate, just like the operating system that they rely on now, Linux, is insulated from bad vendor behaviors. So our commitment to an open source data platform is absolute…..

We saw early an opportunity to deliver a unique differentiated IP as Cloudera addressed common problems. We invented Cloudera Manager and we've been working on it for three years, it's our IP. It's best of breed and best in market. We recently introduced our BDR solution for backup and disaster recovery and Data Navigator which is basically audit logging and compliance reporting for data access.

We've innovated in security both in open source and in our management infrastructure. We think that the combination gives us the best and most capable platform. We're the leader among those open source providers in the volume of software that we give away. No one writes and gives away more open source software for the Hadoop system than we do.

We think our strategy is the right one: A hybrid IP strategy aimed at customer success that allows us to craft these long lasting relationships. We've got an annual subscription business. If we can't get you to come back every single year because you love the services and you’re profiting from your data, we're doomed. So we think our customers are insulated from that bad vendor behavior as a result.”

Evolution of the Big Data market

Mike:

What really has to happen is that we need applications focused on real business use cases. I love the technology, but nobody buys the technology. Everybody buys the solution to their business problems. We are now, by the way, seeing applications emerge that do exactly that. So Amdocs, for example, builds a churn management application that they sell into mobile providers on top of Cloudera’s platform and those mobile providers don't actually realize that they're buying Cloudera. That's what we need to see happen broadly in the market.

Do you see Cloudera as a Hadoop company or going beyond that?

Hadoop has grown beyond what Google originally designed. The name is going to expand to cover the Big Data platform of the future; it's just too great a name to abandon, right? But what we ship today is much larger than what Hadoop was when Facebook and Yahoo! and others collaborated on it the very earliest days…what Doug Cutting created.

Impala and Innovations in Big Data processing on Hadoop beyond MapReduce

Mike:

When we started we saw enormous opportunity in Big Data, beyond the software that then existed. When Google invented this new platform, it invented two things; a storage layer that could take any kind of data in enormous volume very, very cheaply, right? So out of storage infrastructure it delivered a new engine for analyzing that data. That engine was called MapReduce. You gang a whole bunch of computers together and you take advantage of all their discs, but you also take advantage of all of their CPUs and you push your analytic jobs down to run on those servers right on the data, you don't need to move the data out. That was transformational and worked miracles for the web properties. But look, not every single business problem can be solved by MapReduce

Cloudera announced Impala, which is a high performance, interactive, SQL engine running natively in a distributed way on your big data Hadoop infrastructure to leverage the investment in SQL and the broad knowledge of that language. Impala is just an engine that goes to the data. It doesn't take advantage of any of the MapReduce infrastructure. [It is] an entirely separate scale-out database engine the way you would design it in 2013, a query processing engine, and we know how to build distributed query processors. So that's what we've built.

We recently announced the availability of Cloudera Search. We took the SolrCloud, document indexing, and search engine, and we made it run in the same way in massive parallel on all of those servers -- each of them looking at its own little fragment of the data. With our partner SAS, we've helped them redesign their numerical analysis engine so that it can run in a data parallel way and you wind up installing the SAS engine on all the nodes in a Hadoop cluster and now you can ask numerical analytic questions of a petabyte of data in no time at all.

The real insight here is the big scale-out store gives you a way to push different engines down to the data. Our vision is, you want five or 10 or 50 different engines that go visit the data…... so the real platform of the future is going to support a variety of ways for getting at the identical data. You'd like to be able to search for a data set of interest using Cloudera Search, and then use machine learning and analytics and MapReduce to produce a derived table that you then query by Impala, right?

Impala and differentiating against other SQL on Hadoop initiatives?

Mike:

In the last four or five years we have driven innovation on the platform. We were the first vendor in this space, we were the first vendor with a Hadoop distribution, we were the first company to add HBase for NoSQL scale-out data delivery at web speed. We announced and delivered Impala to the market, we're the only company today with Search, the only company integrating proprietary products from established vendors like SAS.

We have explained to our competitors and to the market at large what the Big Data platform of the future looks like, and we're flattered by the fact that they've acknowledged that we're right. So Stinger was not a forward-looking announcement, it was a reaction. The Drill announcement from MapR likewise was well aware of the work that we were doing at Cloudera. Our job is to continue to innovate on the platform to drive it forward, but more significantly to be sure that our customers are successful with Big Data. Having the coolest, most capable, most “feature-full” platform is of no value if we're not solving meaningful business problems for C-level executives.

…

.we want to grow for a long time, to remain independent. We believe we've got an opportunity……..to cross through a billion dollars in revenue and that only happens if we're delivering serious value to customers and they keep coming back. So, innovation yes, absolutely and we think we'll continue to do that. More significantly though real business value for real important problems.

More in Part Two

You're welcome to share this page:

Datacon extends support to inaugural BigData and BI Excellence Awards

‘Big Data Vendor Landscape Study’ launched at DataCon 2013

Related posts

What does it take to be data scientist at Uber

What next for Uber with data?

How Uber Depends on Data Analytics to Deliver Extreme Customer Service – Face To Face With Uber’s Chief Data Architect