What is Splunk and how does it work?

5 min readAug 31, 2022

Splunk is an ultimate log collection and analysis tool, which allows users to perform real-time Server monitoring.

Characteristics of Splunk:

Splunk collects all the logs from servers or instances and forwards them to a remote instance.
Format the logs in Human Readable and understandable format.
Send Alert Notifications.
Store Historical data and logs for analysis.
It provides a Splunk Web Interface consisting of many tools that you need to search, report, and analyze the data and administrate the users and their roles.

Default Port: 8000Port for Data Receiving: 9997

Roles in Splunk:

Roles Determine what a user is able to see, do, and interact with.

There are 3 main roles in Splunk Enterprise:

Admin: The most powerful role, can install apps, ingest data, and create knowledge objects for all users.
Power: The Power user role, can create and share the knowledge objects with all users & apps, and can do real-time searches.
Users: The user role or Normal user role, can only see their own knowledge objects and those shared with them.

Splunk Components:

Forwarders :

Forwarders are instances that consume data and forward it to the indexers for processing.
Forwarders usually reside on machines, where the data originates.

i.e If we had a web server that we would like to monitor, we would have to install the forwarder on that server and have it send data to our indexer.

Indexers :

Indexers Process Incoming data and store the results in indexes as events.
It creates a number of files, organized in sets of directories by age.

When we search the data, Splunk will only need to open the directories that match the time frame of our search.
Directory Path : $SPLUNK_DB/<index_name>/db
$SPLUNK_DB : /opt/splunk/var/lib/splunk/

Search Heads :

Allow users to use the Splunk Search Language to search the indexed data.
It handles search requests from users and distribute the requests to indexers.
Search heads then consolidate and enrich the result before returning them to the users.
It also provides various tools, such as Dashboard, Reports, and Visualizations to assist the search experience.

How does Splunk work?

Buckets in Splunk:

To store the logs or data, Splunk uses various path/directories, which is called Buckets:

Home/Hot Path: This is the directory where all data is written, and the most recent data is kept here

Default Path: $SPLUNK_DB/INDEX_NAME/db/

Cold Path: Rarely searched data as it has aged or been archived (rolled) to this bucket. While read only and still searchable, this is considered the archive tier.

Default Path: $SPLUNK_DB/INDEX_NAME/colddb

Frozen Path: This is data that is pushed to a dead media like tape or deleted.

Default Path: $SPLUNK_DB/INDEX_NAME/frozendb

Thawed Path: Thawed directory is a place to put data that you would like to recover after it was frozen.

Default Path: $SPLUNK_DB/INDEX_NAME/thaweddb

Splunk_DB is the default directory path of Splunk "/opt/splunk/var/lib/splunk/"

All this buckets path we can define at the time of Index creation

How to create Index in Splunk?

To Create Index in Splunk, access the Splunk UI with admin rights:
Settings > Indexes > New Index and provide following info :

Index Name
Index Data Type : Events/Metrics
Home/Hot Path:
Cold Path:
Thawed Path:
Data Integrity Check:
Max size of Entire Disk: 500GB(Default)
Max size of Hot/Warm/Cold bucket:
Frozen Path:
App:

How to configure forwarding on nodes in Splunk?

We first need to configure the indexers that the forwarder will send its data to:

cd /opt/splunkforwarder/bin
sudo ./splunk add forward-server <searchheads_IP>:9997

To add Data for monitoring and send to the Indexers run the below command, make sure you are at: /opt/splunkforwarder/bin

sudo ./splunk add monitor <logfile> -sourcetype <sourcename> -index <index_name>

Or you can also add the forwarding in inputs.conf file at /opt/splunkforwarder/etc/apps/search/local/inputs.conf

[monitor:///var/log/httpd/access_log]
disabled = false
index = weblogs
sourcetype = accesslogs

Splunk Search Language:

Splunk provides a wide variety of commands and options to search the events/logs, create Dashboards, analyze results from logs, which is called Splunk Search Language:

The Splunk Search Language or Search Syntax is built from 5 components:

Search Terms: What you are going to search.
Commands: What do you want to do with that search results?
Functions: Explain how we want to chart, compute and evaluate the results.
Arguments: Variables that we want to apply to the functions.
Clauses: Explain how we want the result to be defined or grouped.

Ex: index=”weblogs” sourcetype=accesslogs | dedup host | stats list(host) as “hostname”

Breakup of the above command:

index=”weblogs” sourcetype=accesslogs > Search Term

dedup & stats > Command

list > Function

(host) > Arguments

as > Clause

List of some of the most used commands:

field : it allows us to include and exclude fields from search results. If we choose negative sign “-” it will exclude the fields from search results
table : Its same as field command, but it’s retained the data in tabular format
rename : Use to rename the fields in search results, once we renamed the fields, we are no longer able to search them with original name
dedup : to remove duplicate events from search results that share common values
sort : display the result in ascending or descending order. By default, a given field will be displayed in ascending order, can also be indicated with + sign
top : find the most common values of given filed in the search result. Default result is limited to top 10
rare : same as top command but give the least common value of search result. Default result is limited to least 10.
stats : to get statistics of search result. (count, distinct count, sum, avg, min, max, list, values)

Refer for more sets of Splunk commands: https://docs.splunk.com/Documentation/Splunk/latest/Search/Typesofcommands

Fields in Splunk:

In Splunk, we have fields on the sidebar, which display all the fields that are extracted at search time and divided into “Selected Fields and Interesting Fields”

If you are going to search anything through fields, make sure that field names are case-sensitive while values are not.

The operators = and != can be used with fields of numerical or string values.
While the operators >(greater then), >=(greater then equals to), < (less then), <= (less then equals to) can be used for fields with numerical values only.

That’s it all about Splunk in a nutshell for absolute begineers. Thanks for Reading!!!

If this post was helpful, please click the clap 👏 button below a few times and follow to show your support.

Refer to more articles on Kubernetes/Docker from Author:-

Kubernetes workflow for Absolute Beginners
How kubectl apply command works?
Rolling Updates and Rollbacks in Kubernetes
Static Pods in Kubernetes
Daemon Sets in Kubernetes
Node Selectors in Kubernetes
Multiple Schedulers in Kubernetes
Kubernetes Services for Absolute Beginners — NodePort
Kubernetes Services for Absolute Beginners — ClusterIP
Kubernetes Services for Absolute Beginners — LoadBalancer
labels-and-selectors-in-kubernetes
What is Dockerfile and how does it work?