Skip to main content

How to use Splunk SPL commands to write better queries - Part I

Introduction

As a software engineer, we are quite used to deal with logs in our daily lives, but in addition to ensuring that the necessary logs are being sent by the application itself or through a service mesh, we often have to go a little further and interact with some log tool to extract more meaningful data. This post is inspired by a problem I had to solve for a client who uses Splunk as their main data analysis tool and this is the first in a series of articles where we will delve deeper and learn how to use different Splunk commands.

Running Splunk with Docker

To run Splunk with docker, just run the following command:

docker run -d —rm -p 8000:8000 -e SPLUNK_START_ARGS=--accept-license -e SPLUNK_PASSWORD=SOME_PASSWORD --name splunk splunk/splunk:latest

Sample Data

We are going to use the sample data provided by Splunk. You can find more information and download the zip file from their web site.

How does it work?

In order to be able to interact with Splunk to generate better reports and visualizations, we need to know SPL, short for Search Processing Language. The key to understand Splunk queries with SPL is to know that it is based on unix pipeline. So what are unix pipeline and how does it help us to understand how Splunk queries work?

According to a definition from the Wikipedia web site:

In Unix-like computer operating systems, a pipeline is a mechanism for inter-process communication using message passing. A pipeline is a set of processes chained together by their standard streams, so that the output text of each process (stdout) is passed directly as input (stdin) to the next one.

Creating queries with SPL

You can think of each pipe acting as a filter on your search result. Lets take the following query as example:

index=main productId="WC-SH-G04"

As stated in the Splunk docs, it helps to visualize your indexes as if they were regular tables in a database. So the previous query would be something like this in a sql query.

select * from main WHERE productId="WC-SH-G04"

Our query will produce the following result:

Now lets apply another filter to our search result by adding the following to our previous command:

index=main productId="WC-SH-G04" | top limit=3 status

The first search returned us the total number of events where productId="WC-SH-G04", but it is also showing us the result in the Splunk Events default view and we don't want to see our results like that, right? We want to be able to show our query results in a form of table where we can bring to our business some kind of data analysis so we can extract useful information to our business from all the logs we produces. This is where things starts to get interesting. First of all Splunk will start to show our results in a table just by adding the previous command followed by a pipe. Just like unix we pass the result of a previous command to the next one by adding a pipe ( | ) operator between the commands. So the previous command will produce the following result:

Our new command is asking Splunk to get all results from our first command, the command on the left side of our pipe, and transform them by returning us the top 3 results by status type.

Conclusion

While writing Splunk queries it helps to think of them as regular independent sql queries where the result of the first one can be used as an input for the next query. Splunk have a huge number of commands and operators. You can find more information about them on the section below and in the next posts!

Additional Reference

Splunk commands

Working with pipes on the Linux command line

Splunk official Docker image

Comments

Popular posts from this blog

Log Aggregation with ELK stack and Spring Boot

Introduction In order to be able to search our logs based on a key/value pattern, we need to prepare our application to log and send information in a structured way to our log aggregation tool. In this article I am going to show you how to send structured log to ElasticSearch using Logstash as a data pipeline tool and how to visualize and filter log information using Kibana. According to a definition from the Wikipedia website: Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. According to Elasticsearch platform website , Elasticsearch is the heart of the Elastic stack, which centrally stores your data for lightning fast search. The use of Elasticsearch, Kibana, Beats and Logstash as a search platform is commonly known as the ELK stack. Next we are going to start up Elasticsearch, Kibana and Logstash using docker so we can better underst...

How to create a REST API Pagination in Spring Boot with Spring HATEOAS using MongoDB

Introduction In this post we are going to see how we can create a REST API pagination in Spring Boot with Spring HATEOAS and Spring Data MongoDB . For basic queries, we can interact with MongoDB using the MongoRepository interface which is what we are going to use in this tutorial. For more advanced operations like update and aggregations we can use the MongoTemplate class. With Spring applications we start adding the needed dependencies to our pom file if using Maven as our build tool. For this project we are going to use the following dependencies: Spring Web , Spring Data MongoDB and Spring HATEOAS . To quickly create your Spring Boot project with all your dependencies you can go to the Spring Initializr web page. This is how your project should look like: As with any MVC application like Spring there are some minimal layers that we need to create in our application in order to make it accessible like the Controller , Service , Model and Repository layers . For this...

Understanding RabbitMQ

Introduction RabbitMQ is a centralized message broker based on the AMQP (Advanced Message Queuing Protocol) protocol, acting as a Middleware between Producers and Consumers of different systems. In a message system, Publishers sends a message to a message broker where messages are consumed some time later by one or more Subscribers. By introducing a message brokeer between systems we are decoupling the sender application from the receiver. In this case the service that is responsible for sending or publishing the message does not need to know about any other service. All it needs to care about is the message and its format. With a message system you send a message to a message broker first and when the consumers or listeners of it become online they can start consuming from the message queue. This means you can keep sending messages without even care if the other application is online or if they had any failures. RabbitMQ Architecture Exchange, queue and bindings are the ...