Sunday, October 15, 2017

Spelunking your Splunk – Part I (Explore Your Data)

By Tony Lee

Introduction

Have you ever inherited a Splunk instance that you did not build?  This means that you probably have no idea what data sources are being sent into Splunk.  You probably don’t know much about where the data is being stored.  And you certainly do not know who the highest volume hosts are within the environment.

As a consultant, this is reality for nearly every engagement we encounter:  We did not build the environment and documentation is sparse or inaccurate if we are lucky enough to even have it.  So, what do we do?  We could run some fairly complex queries to figure this out, but many of those queries are not efficient enough to search over vast amounts of data or long periods of time—even on highly optimized environments.  All is not lost though, we have some tricks (and a handy dashboard) that we would like to share.

Note:  Maybe you did build the environment, but you need a sanity check to make sure you don’t have any misconfigured or run-away hosts.  You will also find value here.

tstats to the rescue!

If you have not discovered or used the tstats command, we recommend that you become familiar with it even if it is at a very high-level.  In a nutshell, tstats can perform statistical queries on indexed fields—very very quickly.  These indexed fields by default are index, source, sourcetype, and host.  It just so happens that these are the fields that we need to understand the environment.  Best of all, even on an underpowered environment or one with lots of data ingested per day, these commands will still outperform the rest of your typical searches even over long periods of time.  Ok, time to answer some questions!

Common questions

These are common questions we ask during consulting engagements and this is how we get answers FAST.  Most of the time 7 days’ worth of data is enough to give us a good understanding of the environment and week out anomalies.

How many events are we ingesting per day?
| tstats count where index=* by _time

Figure 1:  Events per day


What are my most active indexes (events per day)?
| tstats prestats=t count where index=* by index, _time span=1d | timechart span=1d count by index

Figure 2:  Most active indexes


What are my most active sourcetypes (events per day)?
| tstats prestats=t count where index=* by sourcetype, _time span=1d | timechart span=1d count by sourcetype

Figure 3:  Most active sourcetypes


What are my most active sources (events per day)?
| tstats prestats=t count where index=* by source, _time span=1d | timechart span=1d count by source

Figure 4:  Most active sources


What is the noisiest host (events per day)?
| tstats prestats=t count where index=* by host, _time span=1d | timechart span=1d count by host

Figure 5:  Most active hosts


Dashboard Code

To make things even easier for you, try this dashboard out (code at the bottom) that combines the searches we provided above and as a bonus adds a filter to specify the index and time range.

Figure 6:  Data Explorer dashboard

Conclusion

Splunk is a very powerful search platform but it can grow to be a complicated beast--especially over time.  Feel free to use the searches and dashboard provided to regain control and really understand your environment.  This will allow you to trim the waste and regain efficiency.  Happy Splunking.


Dashboard XML code is below:


<form>
  <label>Data Explorer</label>
  <fieldset submitButton="true" autoRun="true">
    <input type="time" token="time">
      <label>Time Range Selector</label>
      <default>
        <earliest>-7d@h</earliest>
        <latest>now</latest>
      </default>
    </input>
    <input type="text" token="index">
      <label>Index</label>
      <default>*</default>
      <initialValue>*</initialValue>
    </input>
  </fieldset>
  <row>
    <panel>
      <chart>
        <title>Most Active Indexes</title>
        <search>
          <query>| tstats prestats=t count where index=$index$ by index, _time span=1d | timechart span=1d count by index</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="charting.chart">column</option>
        <option name="charting.drilldown">none</option>
      </chart>
    </panel>
  </row>
  <row>
    <panel>
      <chart>
        <title>Most Active Sourcetypes</title>
        <search>
          <query>| tstats prestats=t count where index=$index$ by sourcetype, _time span=1d | timechart span=1d count by sourcetype</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="charting.chart">column</option>
        <option name="charting.drilldown">none</option>
      </chart>
    </panel>
  </row>
  <row>
    <panel>
      <chart>
        <title>Most Active Sources</title>
        <search>
          <query>| tstats prestats=t count where index=$index$ by source, _time span=1d | timechart span=1d count by source</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
        </search>
        <option name="charting.chart">column</option>
        <option name="charting.drilldown">none</option>
      </chart>
    </panel>
  </row>
  <row>
    <panel>
      <chart>
        <title>Most Active Hosts</title>
        <search>
          <query>| tstats prestats=t count where index=$index$ by host, _time span=1d | timechart span=1d count by host</query>
          <earliest>$time.earliest$</earliest>
          <latest>$time.latest$</latest>
          <sampleRatio>1</sampleRatio>
        </search>
        <option name="charting.chart">column</option>
        <option name="charting.drilldown">none</option>
      </chart>
    </panel>
  </row>
</form>