Input CSV and run search against subset of users

cewing082 · ‎05-08-2018

I have a list of accounts that I wish to monitor in a csv file, say accounts.csv.

The file looks like:
userid,username
joeuser,Joe User
jimuser,Jim User
Janeuser,Jane User

And the logs are in the format of:

operation=openedfile
UserId=joeuser@domain.com

so i need to compare the CSVs userid field to the UserId (stripping away the @ so they are the same).

I would like to produce a table showing all the accounts in the csv file and when they performed an activity (or what the activity was).

The trick is to create a table that shows all users in the CSV (not all that were found in the search - just the CSV), whether or not they performed an action. If they never did anything, then the last_activity field would be blank (or some message like "not logged in"). I can get close by doing something like:

I believe I need to do a join somehow, but I can't seem to get the syntax right, I'm stilling learning this.

If anyone can provide some pointers that would be greatly appreciated.

Thanks!

EDIT
I found this but can't seem to make the answer work as my userid from my events & csv are a bit different.

https://answers.splunk.com/answers/589551/get-last-login-time-based-upon-a-list-of-accounts-1.html

elliotproebstel · ‎05-08-2018

You don't mention where the field department comes from (the csv file or the events), so I'm going to assume it's in the csv file. This will need some slight adjustment if department is in the base events.

index=unix_security 
    [| inputlookup accounts.csv 
    | fields userid 
    | eval userid=userid."@*"] 
| stats latest(_time) AS last_login BY userid 
| rex field=userid "^(?<userid>[^@]+)"
| append [| inputlookup accounts.csv | eval last_login=0 ]
| stats max(last_login) AS last_login, latest(username) AS name, latest(department) AS department BY userid
| sort - userid

As you mentioned that you're still learning, I'll walk through it, chunk by chunk. First, we gather events relating only to the users in the csv by using a subsearch to gather the userid values and adding "@*" to the end of each one, as you mentioned that the raw events contain the full email address but the csv file contains the username (portion of the email before the @ sign). If all the userids belong to the same domain, you might replace "@*" with "@yourdomain.com" or whatnot.

index=unix_security 
    [| inputlookup accounts.csv 
    | fields userid 
    | eval userid=userid."@*"]

After that, I replaced your dedup command with this stats call:

| stats latest(_time) AS last_login BY userid

This more efficiently returns the latest value of _time for each userid, which also dedups the results. I next truncate the userid values from the raw events:

| rex field=userid "^(?<userid>[^@]+)"

Then I append in the whole list from the csv, and add last_login=0 to each user so that this call:

| stats max(last_login) AS last_login, latest(username) AS name, latest(department) AS department BY userid

can either preserve the 0 (if the user was not seen in the raw events) or carry through the last_login value from the events. This also preserves the username and department if they were in the csv. If you need to extract them from the raw events, you'll need to add latest(username) AS name, latest(department) AS department to the first stats call, as well. Finally, I preserve the sorting you intended:

| sort - userid

I hope this is helpful!

cewing082 · ‎05-08-2018

I'm trying to run this and it's not returning any results, i think the problem lies here (?)

 index=unix_security 
     [| inputlookup accounts.csv 
     | fields userid 
     | eval userid=userid."@*"]

my csv contains userids but my events actually contain full email address (userid@domain)

elliotproebstel · ‎05-09-2018

This line should be accounting for that:

| eval userid=userid."@*"

That should transform a userid from the csv file, such as jsmith and turn it into jsmith@*. So then the result of this:

index=unix_security 
    [| inputlookup accounts.csv 
    | fields userid 
    | eval userid=userid."@*"]

would be a search like this:

index=unix_security (userid="jsmith@*" OR userid="dburns@*" OR userid="alews@*" ...)

But of course I believe you when you say it's not working, so let's start with this:

| inputlookup accounts.csv 
| fields userid 
| eval userid=userid."@*"
| format

Does that return something like userid="jsmith@*" OR userid="dburns@*" OR userid="alews@*" ...?
If so, then you could manually troubleshoot by running this:

 index=unix_security 
| stats latest(_time) AS last_login BY userid

See if the list of userid values contains users from the csv - and, in particular, users whose userid values match that pattern of jsmith@*, etc. Let me know if you find the discrepancy, and I can help you adjust the search as needed.

elliotproebstel · ‎05-14-2018

Any luck on this? Do you still want some help?

cewing082 · ‎05-14-2018

You're the man - thanks so much. This got me exactly where I wanted to be!

elliotproebstel · ‎05-14-2018

Great! Glad it's working. 🙂

Input CSV and run search against subset of users

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!

Splunk Custom Visualizations App End of Life