Splunk Search

Why doesn't the reducer get all events from the mapper function in my custom reporting command?

dcagatay
Explorer

I am trying to write a custom reporting command that finds the top words. It seems to work, but I see some data isn't transferred to reducer from mapper. For example, I process 10 events and produced 100 words on each mapper invocation, the reducer should get 100 x mapper times words to process, but it doesn't happen. Some of the words yielded by the mapper cannot be accessed by the reducer.

My mapper and reducer implementation is below.

@Configuration()
def map(self, records):
    self.logger.debug('TopWordsCommand.map')
    fieldname = self.field
    total = {}
    cnt = 0
    word_cnt = 0

    for record in records:
        text = record[fieldname]
        for word in text.split():
            if word in total:
                total[word] = int(total[word]) + 1
            else:
                total[word] = 1
            word_cnt += 1
        cnt += 1

    for word, count in total.iteritems():
        yield { 'word': word, 'count': count }

    self.logger.info('Finished map. Processed {} events and {} words.'.format(cnt, word_cnt))

def reduce(self, records):
    self.logger.debug('TopWordsCommand.reduce')
    total = {}
    word_cnt = 0
    uniq_word_cnt = 0

    for record in records:
        word = record['word']
        count = record['count']
        word_cnt += 1

        if word in total:
            total[word] += int(count)
        else:
            total[word] = int(count)
            uniq_word_cnt += 1

    for word, count in total.iteritems():
        yield { 'word': word, 'count': count }

    self.logger.info("Finished reduce. Total number of words {}, unique words {}".format(word_cnt, uniq_word_cnt))
0 Karma

DeronJensen
Explorer

I don't know if this is the issue but line 16:

         word_cnt += count

I think there are 2 different variables. 'cnt' and 'count'. At line 16 'count' is not defined.

0 Karma

dcagatay
Explorer

No that is not the issue. I didn't post the actual code, but it is very similar to the original to give the gist. The actual code doesn't give that kind of errors.

0 Karma
Get Updates on the Splunk Community!

Modern way of developing distributed application using OTel

Recently, I had the opportunity to work on a complex microservice using Spring boot and Quarkus to develop a ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had 3 releases of new security content via the Enterprise Security ...

Archived Metrics Now Available for APAC and EMEA realms

We’re excited to announce the launch of Archived Metrics in Splunk Infrastructure Monitoring for our customers ...