Splunk is an extremely versatile tool when dealing with data:
- Monitor files? Check!
- Listen in on an open port? Check!
- Monitor the file system? Performance monitor? HTTP Event Collector?
- Check, check aaaaand check!
But what if the data you want to ingest does not have a method listed above? Say, something like a database or a security tool’s API? Scripted inputs are the solution! Splunk can even employ a variety of scripts to include (but not limited to) PowerShell, shell scripts, and Python. Besides working around data sources, which do not use log files and cannot send via TCP or UDP, the advantages abound and include:
- Structure (or restructure) data for easy ingestion and manipulation
- Examples: Key value pairs and/or Common Information Model (CIM) compliance
- Discard unnecessary data before it hits your indexers and your license
- Utilize a tools API
So how do we go about implementing this magical data ingestion method? As with most methods there are a few caveats, but these can be overcome with a little Splunk-fu and knowledge!
For this example, we’ll be using a Python script. First, let’s test our script using Splunk’s Python version (as opposed to Linux’s Python version). This will verify that any dependencies the script uses are available. (TIP: Copy any missing dependencies to /opt/splunk/lib/python2.7/site-packages/):
$SPLUNK_HOME/bin/splunk cmd python <SCRIPT_NAME>.py
We can validate the output is the data we want in a format we expected. Now we’ll copy the script to whichever app we want this script associated with:
Next, lets add a stanza to the inputs.conf of the same app:
disabled = 0
host = <HOSTNAME> #Hostname will dynamically resolve to the hostname of the host the script is executed on if blank
index = <INDEX>
interval = 86400 #How often execution script occur in seconds if an integer or set to cron job
sourcetype = <SOURCETYPE>
If the script is set to an integer, the script will execute on boot and the interval timer will begin. If Splunk is restarted inside that interval, the script will run again on boot and the timer will restart. Additionally, if the script has not finished executing before the timer runs out, Splunk will wait for the instance to complete before executing another instance.
Run a search, and behold the glory that unfolds before you! At this point, we can start to treat the data like we would any other new data source, configure the props.conf and transforms.conf, create reports and dashboards, and generally be more awesome.
That’s all there is to it!
As always, Splunk has boundless additional detailed documentation, which I could not include without writing a novel!