Making Big Data analysis as simple as Excel Pivot with Splunk 6 and Hunk

Cloudera COO advises Australian and Asia Pacific enterprises – Don’t hesitate, take the Big Data plunge
December 17, 2013
Here’s where Big Data is headed in 2014: BigInsights
January 22, 2014
BigInsights recently met up with Clint Sharp, Director of Product Management for Big Data & Operational Intelligence at Splunk, who was also present at Strata + Hadoop World in New York, to ask him how Splunk could make it easier for IT organisations to gain insights from Big Data.

By: David Triggs, CTO, BigInsights

Part 1

What does Splunk enterprise have to offer? What makes it successful?

While Splunk was new to Hadoop, the number of people at its .conf2013 in Las Vegas was double that of 2012, so clearly, there was a significant increase in the interest in Splunk, something that Clint was quick to point out at the start of the interview.

“They are coming because of Splunk Enterprise and what it is doing for them across a variety of use cases. Our business is really in about five different segments right now:

  1. IT operations
  2. Application management
  3. Security
  4. Digital intelligence (which is web analytics and things of that nature) and
  5. The Internet of Things

The last is related to connecting devices and things like power utilities and SCADA data.  So if you look at the distribution of the Splunk business, the first three own about 90 per cent of the business.  Security is about a third of the business, application management is about a third, IT operations 20+ per cent, and then there is the other 10 per cent that is digital intelligence followed up with this emerging Internet of Things.

“So most people are coming because their IT ops guys are running complicated application infrastructures, they are doing security, and what they are finding is they have all this data they need to analyse as an IT organisation, and we are at a conference where we are talking about data analysis and things like that but it’s too expensive to go through this rigid data collection ETL process for your monitoring data.  Ultimately, all these projects fail because as soon as there is a developer assigned to working on an IT data warehouse, they are going to get pulled off and go put on something else.

“Splunk is really their IT data warehouse.  They are dumping in raw unstructured data and we allow them to query that in Clintwhatever form it came, structure it, do dashboards, analytics, etc. which is incredibly powerful.  It’s not going to be as fast as a database because we are doing all that work at the last possible moment, we are going through a lot of CPU cycles to do it but for an IT guy doing an outage investigation, 10, 15, 20 seconds is perfectly fine; response time for the last four hours of data doesn’t need to come back 100 milliseconds. The flexibility that this late binding schema gives us is incredibly powerful in the IT market and now we are taking and expanding that to the business and to other use cases and that’s why you hear me talking about schema all the time.”

Hunk: Bringing Splunk Enterprise capability to Hadoop

“In Splunk Enterprise we have an agent we call the forwarder,” Clint explained. “The forwarder sits out in the IT infrastructure, you configure it to look at directories and it will grab all the files in a directory and it will tail them and look for new data coming in and it will send it off to our indexed tier and then store it and manage it.  We keep a copy of all the raw data and then we store an index for rare term search.  So, we are very, very good at ‘needle in the haystack’ searches.  We can find one event in a billion nearly instantly.  We are optimised for that sort of use case.”

“With Hunk it’s actually the exact opposite,” Clint pointed out. “You bring us the data stored in Hadoop however it stored, however it got there and we analyse it in place.”

Hunk, however, is much more than just a connector to Hadoop, and requires a significant engineering effort, something that Clint elaborated on. “We have taken and abstracted the data storage layer from the rest of the engine and that has allowed us to take the same UI’s, search language, application development framework, rule-based access controls and adapt it to Hadoop but we charge based on data ingestion.  So, this necessitates essentially a different product because now we are analysing data at rest.  So, it creates a business problem of, ‘I have this great technology.  Now, I need a new product.’  Really, the business model drove the productisation of it.  So, it’s not a connector.”

Making Big Data analysis as easy as an Excel Pivot Chart in Splunk 6

Knowing that a lot of the value in Big Data lies in analysis, visualisation and reporting doesn’t make it easy, and a common theme in discussing Big Data is the difficulty in getting business value from it. One of the things that look really exciting in the recently released in Splunk 6 is pivot capability. When asked to elaborate, Clint explained, “We have always been ‘raw data to intelligence’, that’s what we do, and Pivot just layers on top of that.  So, now, instead of having that same capability for sys admins, network engineers and maybe even their data guys, maybe their DBA of data analysts or something like that, I am now giving it to a product manager or to a line of business owner.  That’s incredibly powerful because now I can have data that came straight out of my web server or straight out of my application written raw in a log file into Splunk Enterprise or Hadoop. Then I can give it straight to our product manager without a whole expensive BI project where I went and ETL’ed and put it into a data mart and had BI tools hitting the data mart.  Maybe that doesn’t give them all the capabilities that they would have from a full-fledged data warehouse but for rapid prototyping, for what we are calling exploratory analytics, this is hugely valuable and then maybe they want to build a data warehouse on that but what happens if they go through all this effort and realize that there is really no value in that data?  That’s where we are really helping and you may have hundreds of thousands of formats that you put into Hadoop or into Splunk or wherever else and maybe some of that’s valuable, maybe some of that’s total trash, actually a lot of it probably is total trash but I want to be able to do ad hoc analytics first before I invest in building a really excellent reporting structure on it.”

Examples of innovative use cases of Splunk

While Splunk’s strength was still in IT management and security applications, interesting Big Data use cases were starting to emerge too. When asked about this, Clint agreed. “We have numerous examples in telcos across the world,” he said. “That has been a vertical that has been extremely successful in monitoring things like ordering and activation processes and I know personally because I came from a telco and I implemented this use case.  Before I left Cricket, while I was there I could see every sale, I could see what every retail rep did, every door that was turning, every call to customer care, and do those sorts of correlations.  We have large American insurance provider that’s using it to watch claims processing, and look for errors, look for fraud, look for a lot of interesting correlations.

“In Germany, an auto group which is a large online retailer, manages its complete fulfilment process with Splunk and it has done a lot of work to instrument its application so that the Group has, like a transaction ID that can follow through various steps. They can tell where things were failing at step 4 of a 7-step process and say, ‘Okay, well, I am not actually shipping the devices these people have paid for.’  These are what we call operational intelligence examples.”

One of the advantages of Splunk’s rapid exploratory capability is that sometimes value comes from unlikely sources. “The one that I like to use is a Japanese company which came into the office, and it is using Splunk to look at elevator data.  Actually, the elevator data is a leading indicator of whether people are going to renew their lease because if there is no foot traffic to the facility, then odds are they probably aren’t.  So, they are using it to predict building lease renewals based off of elevator machine data.  There are all sorts of unconventional use cases like that,” Clint said.

With some Big Data companies focussing just on areas like security, Clint said any discussion on use cases needed to include the fact that “most people used it for IT and application management, and security; there was a lot of really good security use case examples, but those that were sort of still emerging were really cool.”

Initial applications of Hunk

With Splunk recently announcing the general availability of Hunk, BigInsights asked Clint about initial applications, especially innovative ones, and the differences from traditional Splunk use cases.

“Give me a quarter after GA and I will have some good stories but for now people are really just experimenting and there is some good experimentation going on.  A lot of the use cases are around click stream, web logs, that sort of thing.  That seems to be the most prevalent use case but we have another healthcare provider wanting to analyse claims data with it.  We have a lot of people using it for security use cases.  They want to dump everything from packet capture data to logs and they want to have big repositories of a year or two years’ worth of data.  So, the difference with what we are seeing with Hadoop is they want to have really, really long retention periods of data and they are creating this sort of data lake concepts where it’s just a dumping ground for data,” he said.

Splunk’s perspective on Hadoop

Clearly, like most established businesses faced with the emergence of a disruptive technology, Splunk has taken time to consider the extent to which Hadoop was competitive or complimentary.

In Part 2, Clint talks about Splunk’s view of Hadoop as a Big Data Platform.

About Us

BigInsights is an boutique research & advisory firm focused on Big Data based in Australia. We are focused on helping enterprises, vendors & entrepreneurs with

:
  • Best practices and ROI on using Big Data technologies for Customer and Operational Insights.
  • Emerging Use cases across industries such as FSI, Telecoms, Manufacturing/Distribution/Retail and Social commerce
  • Big Data technologies
  • Business opportunity for Big Data technology creators
You're welcome to share this page: