Solved: Transaction trouble with ping events...

MHibbin · ‎05-21-2012

Hi there,

I am trying to solve a problem with some ping events (not parsed, just literally the output from recursive pinging (single ping count)). The problem is as follows:

I need to create a field that states how long an interface/IP address is unavailable (i.e. all the time it has 100% loss rate, opposed to 0%). So this would be from the point a first failed ping (i.e. 100% loss) to point of the next successful ping (i.e. 0% ping). I have tried to do this with transaction, as it outputs the duration field, but I can't seem to get anything right.

I was wondering if anyone could point me in the right direction.

My events look similar to this...

Pinging 192.168.56.101 with 1 bytes of data:
Reply from 192.168.56.101: bytes=1 time<1ms TTL=64
Ping statistics for 192.168.56.101:
    Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 0ms, Average = 0ms

Pinging 192.168.56.101 with 1 bytes of data:
Reply from 192.168.56.101: bytes=1 time<1ms TTL=64
Ping statistics for 192.168.56.101:
    Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 0ms, Average = 0ms

Pinging 192.168.56.101 with 1 bytes of data:
Request timed out.
Ping statistics for 192.168.56.101:
    Packets: Sent = 1, Received = 0, Lost = 1 (100% loss),

Pinging 192.168.56.101 with 1 bytes of data:
Request timed out.
Ping statistics for 192.168.56.101:
    Packets: Sent = 1, Received = 0, Lost = 1 (100% loss),

Pinging 192.168.56.101 with 1 bytes of data:
Request timed out.
Ping statistics for 192.168.56.101:
    Packets: Sent = 1, Received = 0, Lost = 1 (100% loss),

Pinging 192.168.56.101 with 1 bytes of data:
Request timed out.
Ping statistics for 192.168.56.101:
    Packets: Sent = 1, Received = 0, Lost = 1 (100% loss),

Pinging 192.168.56.101 with 1 bytes of data:
Reply from 192.168.56.101: bytes=1 time<1ms TTL=64
Ping statistics for 192.168.56.101:
    Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 0ms, Maximum = 0ms, Average = 0ms

The time field is the time of indexing.

Thanks in advance,

MHIbbin

lguinn2 · ‎05-21-2012

Try this:

sourcetype=ping | 
rex "Pinging (?<ip>\S+)" |
rex "\((?<loss>\d+)% loss\)" |
sort ip _time |
streamstats current=false last(ip) as lastIP last(loss) as lastLoss |
where not (loss=100 and lastLoss=100 and lastIP=ip) | 
transaction ip startswith="(100% loss)" endswith="(0% loss)" | 
table ip duration

It may not do exactly what you want,but I think you will be on the right track. Here is the problem with the transaction command: when you specify that it starts with 100% loss, then repeated 100% losses create many transactions - not what you want. So what I did:

made sure that I had fields defined for the ip and the loss
captured the ip and loss from the previous event
used the where command to eliminate successive events that had the same ip and 100% loss
used the transaction command to group the events using the ip address and the loss as criteria
display the ip address and duration

There is more that you could do. For example, you could count the number of outages, average duration and overall down time by ip - just substitute the following for the table command:

stats count as NumberOfOutages avg(duration) as AverageOutage sum(duration) as TotalDowntime by ip

Hope this helps. If it doesn't work, please comment on the answer - it could just be a typo. I couldn't really test this before posting.

View solution in original post

lguinn2 · ‎05-21-2012

Try this:

sourcetype=ping | 
rex "Pinging (?<ip>\S+)" |
rex "\((?<loss>\d+)% loss\)" |
sort ip _time |
streamstats current=false last(ip) as lastIP last(loss) as lastLoss |
where not (loss=100 and lastLoss=100 and lastIP=ip) | 
transaction ip startswith="(100% loss)" endswith="(0% loss)" | 
table ip duration

It may not do exactly what you want,but I think you will be on the right track. Here is the problem with the transaction command: when you specify that it starts with 100% loss, then repeated 100% losses create many transactions - not what you want. So what I did:

made sure that I had fields defined for the ip and the loss
captured the ip and loss from the previous event
used the where command to eliminate successive events that had the same ip and 100% loss
used the transaction command to group the events using the ip address and the loss as criteria
display the ip address and duration

There is more that you could do. For example, you could count the number of outages, average duration and overall down time by ip - just substitute the following for the table command:

stats count as NumberOfOutages avg(duration) as AverageOutage sum(duration) as TotalDowntime by ip

Hope this helps. If it doesn't work, please comment on the answer - it could just be a typo. I couldn't really test this before posting.

MHibbin · ‎05-22-2012

by the way, the test data I provided originally was on a Windows system, and the actual data is on Linux... hence the difference in the syntax

MHibbin · ‎05-22-2012

I'm not sure if I still need to use streamstats, but this seems to deliver what I need...

sourcetype="ping" | streamstats last(hostIP) as lasthostIP last(pcktsLst) as lastloss | transaction keepevicted=true hostIP startswith="0% packet loss" | stats sum(duration) by hostIP

MHibbin · ‎05-21-2012

Thanks very much for the help.

I have modified what you have done, which I think has worked to meet my needs... I will update this tomorrow is it works (when I will have access to the actual data) rather than just my test data.

Thanks again,

Matt

Transaction trouble with ping events...

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life