Hi Splunk Experts--
I'm confused about the union command and am hoping you can
help. Specifically, I'm struggling to understand what causes the
"things that get unioned" to be truncated-- in my case to 50,000
records.
Here's an example of what confuses me:
Imagine three sets of data-- I've put them in three separate indexes
called union_1, union_2 and union 3. The data sets are very similar:
each has 60,000 records, each consisting of a timestamp, a color and a
hash. Each data set has exactly one event per second and each covers
the same 60,000 seconds (from 2017-01-01 00:00:01 to 2017-01-01
16:40:00). The color is random and the hash is unique across all
180,000 events (60,000 * three data sets).
Here's union_1:
time color hash
------------------------- ------ --------------------------------
2017-01-01 00:00:01 -0800 blue 08decd051408e648b941b5dbb9b1578c
2017-01-01 00:00:02 -0800 yellow 39d98f7f9a98920ee08631c9e6a4e867
2017-01-01 00:00:03 -0800 green 2b34449aae3a941c64dd76d33a6cfc04
...
2017-01-01 16:39:58 -0800 blue b2cc43ab839bf57711a00f8f7a622e97
2017-01-01 16:39:59 -0800 blue e26f577b10d0fa172c122deca813d38f
2017-01-01 16:40:00 -0800 blue c9b0b55e7513963f7b04cf3c424686f2
...and union_2:
time color hash
------------------------- ------ --------------------------------
2017-01-01 00:00:01 -0800 violet c8e68d6c154fc0ca88220a299dba7c55
2017-01-01 00:00:02 -0800 blue 3e18602a1d137ea4bf9157e67c4386ed
2017-01-01 00:00:03 -0800 violet ecdf61cd34cda950bd782e3a6ba51fd6
...
2017-01-01 16:39:58 -0800 violet 5c00f68da1aa343ec0944fbcd42775fc
2017-01-01 16:39:59 -0800 green 2c3a626ff26a05f9895dc1c9ae1d074e
2017-01-01 16:40:00 -0800 red 9b796de25b072d8a48d3e9a7a716c4e9
...and union_3:
time color hash
------------------------- ------ --------------------------------
2017-01-01 00:00:01 -0800 orange 772468eb812735bfa984b91477afe967
2017-01-01 00:00:02 -0800 violet 6d9ebc2ce8b1c79d42793d624daeb402
2017-01-01 00:00:03 -0800 red a31d8811b95b4597f943f268f4068fb0
...
2017-01-01 16:39:58 -0800 yellow 17b43d58e4920f1d2044552acdad5507
2017-01-01 16:39:59 -0800 violet 12425e908448371c38a1f0fe12aedf73
2017-01-01 16:40:00 -0800 indigo ea1fb54c5c2b5fd2161856ea6937226e
You get the idea... 🙂
Now let's run some SPL:
| union maxout=10000000
[ search index=union_1 ]
[ search index=union_2 ]
[ search index=union_3 ]
| stats count by index
This produces what I'd expect-- 60,000 records per "thing that got
unioned":
index count
------- -----
union_1 60000
union_2 60000
union_3 60000
But let's make things a bit more complicated:
| union maxout=10000000
[ search index=union_1 | head 60000 ]
[ search index=union_2 ]
[ search index=union_3 ]
| stats count by index
Wait, what? Adding a head command to the first search causes the
second and third to be truncated to 50000?
index count
------- -----
union_1 60000
union_2 50000
union_3 50000
How about this one?
| union maxout=10000000
[ search index=union_1 ]
[ search index=union_2 | head 60000 ]
[ search index=union_3 ]
| stats count by index
Hmmm... same result:
index count
------- -----
union_1 60000
union_2 50000
union_3 50000
What if we move the head command to the final search?
| union maxout=10000000
[ search index=union_1 ]
[ search index=union_2 ]
[ search index=union_3 | head 60000 ]
| stats count by index
Wow... now only the final search gets truncated:
index count
------- -----
union_1 60000
union_2 60000
union_3 50000
Notes that may or may not be relevant:
Many commands have a similar effect (i.e. cause the same
truncations) as head-- in particular dedup and sort seem to cause
the same problems.
I suspect that these commands (and presumably many others) cause
the subsearch to no longer qualify as a "streaming subsearch"--
(although honestly I can't imagine why head would do this) and
that this fact makes union behave much more like append.
I believe (but am not sure) that the 50000 truncation limit is due
to maxresultrows in limits.conf-- that value (for me is currently
50000)
For context, here's what I want to do:
In general, get a better understanding of how union works and how
its different than append.
Specifically, union a set of three searches that each produce substantially more
than 50000 records and not experience truncation.
Anybody willing to help me out with this? Would totally appreciate the
benefit of your wisdom 🙂
Thanks!
... View more