Hi everyone!
I'm trying to get some useful stats on my logged data. I have 3 attributes in each log entry, HARVEST_DATE, PUBLISH_DATE and MEDIA_TYPE. I would like to calculate the average difference between PUBLISH_DATE and HARVEST_DATE, as a timestamp (miliseconds).
To do so, I'm executing the following command:
index=sfmc-ris TYPE=PUSH_POST_ASYNC SUCCESS=true
| eval PUBLISH_DATE_TIMESTAMP=strptime(PUBLISH_DATE, "%Y-%m-%d %H:%M:%S,%3Q")
| eval HARVEST_DATE_TIMESTAMP=strptime(HARVEST_DATE, "%Y-%m-%d %H:%M:%S,%3Q")
| eval HARVEST_TO_PUBLISH=round(HARVEST_DATE_TIMESTAMP-PUBLISH_DATE_TIMESTAMP, 1)
| stats avg(HARVEST_TO_PUBLISH) as HARVEST_PUBLISH_AVG by MEDIA_TYPE
However, that outputs no results at all for HARVEST_PUBLISH_AVG. If I remove the by MEDIA_TYPE
part, it will show me the average difference without any grouping, no issues.
Can anyone help me figure out what am I doing wrong?
Thanks!
Although you surely have events for which isnotnull(HARVEST_PUBLISH_AVG)
is true and also events for which isnotnull(MEDIA_TYPE)
is true, you must not have any events for which both are true. You can join them together with the stats
command (and also many other commands) if you have another field (e.g. host
or sessionID
or DOCUMENTID
) like this and it should work:
index=sfmc-ris TYPE=PUSH_POST_ASYNC SUCCESS=true
| stats values(*) AS * by DOCUMENTID
| eval PUBLISH_DATE_TIMESTAMP=strptime(PUBLISH_DATE, "%Y-%m-%d %H:%M:%S,%3Q")
| eval HARVEST_DATE_TIMESTAMP=strptime(HARVEST_DATE, "%Y-%m-%d %H:%M:%S,%3Q")
| eval HARVEST_TO_PUBLISH=round(HARVEST_DATE_TIMESTAMP-PUBLISH_DATE_TIMESTAMP, 1)
| stats avg(HARVEST_TO_PUBLISH) as HARVEST_PUBLISH_AVG by MEDIA_TYPE
Long shot, try something like this
index=sfmc-ris TYPE=PUSH_POST_ASYNC SUCCESS=true| eval PUBLISH_DATE_TIMESTAMP=strptime(PUBLISH_DATE, "%Y-%m-%d %H:%M:%S,%3Q") | eval HARVEST_DATE_TIMESTAMP=strptime(HARVEST_DATE, "%Y-%m-%d %H:%M:%S,%3Q") | eval HARVEST_TO_PUBLISH=round(HARVEST_DATE_TIMESTAMP-PUBLISH_DATE_TIMESTAMP, 1) | stats count sum(HARVEST_TO_PUBLISH) as HARVEST_PUBLISH by MEDIA_TYPE | eval HARVEST_PUBLISH_AVG=HARVEST_PUBLISH /count | table MEDIA_TYPE HARVEST_PUBLISH_AVG
Thanks @somesoni2 for your suggestion. I tried it, along with some variations of the same formula, and nothing. The moment I do "by MEDIA_TYPE" the results get lost and splunk will start showing empty values.
Any other suggestion is much appreciated!
if you remove the stats all together, does your result set have a field called MEDIA_TYPE? All uppercase with an underscore? It would seem just off hand that the field doesn't exist, so the by clause is null, so you don't see any results.
Yes, my result set has MEDIA_TYPE. In fact, If I do a simple count(MEDIA_TYPE) it works fine. other more complex operations with it also work and the results are correct and as expected.
can you group your results by any other field?
If I use any other field, it gives me the same kind of result 😞
I don't suppose you're in fast mode (instead of say smart mode)? I mean, from everything else you said I'm guessing not, but thought it worth asking.
If I were you, I'd start narrowing down where the problem might be. For example, can you chart/timechart/table other fields. So is the problem with stats, or any command or a subset of commands etc.
Also, maybe go back to the beginning of the search after the first pipe and just try to do a stats count by MEDIA_TYPE (or whatever). Does that work? If so, add more of the search until it doesn't work again.
Kind of hard to go through all of those troubleshooting permutations here, but I think you get the idea.
@maciep, I'm not in Fast Mode, I'm afraid.
I can indeed chart any other fields, and even the derived field, unless I use a BY condition. The problem is not the MEDIA_TYPE field, but how splunk reacts when I try to use the BY statement.
Trying to debug as we speak 😞
I agree with MacieP, you must not have a field called "MEDIA_TYPE" in the data pipe to the stats command.