I am new to hadoop streaming. I have few filter conditions in my reduce code, I would like to know how many records pass this conditions. I come to know we can do this by writing custom counters. Can some body show point me how to write custom counters? I am emitting three columns in mapper code, say <code>a,b,c</code> key is a, and value as list, which is like <code>[b,c]</code>, To have an example from mapper code, it is like <code>['I'^['C','P']]</code> Here is my reduce code. <pre class="prettyprint"><code>labels = ["a","b"] for line in sys.stdin: l = line.strip().split("^") key = l[0] value = l[1] record = [key] + value records.append(record) df = pd.DataFrame.from_records(records,columns=labels) df = df((df['a'] == 'I') & (df['b'] == 'C')) </code></pre> I would like to know how many records df contains, at reducer level. Thank you.

You can simply print to stderr: <pre class="prettyprint"><code>print >> sys.stderr, "reporter:counter: CUSTOM, NbRecords,1" </code></pre> This will increment counter "NbRecords" in counters group "CUSTOM" by 1

how to implement counters in hadoop streaming in python

Tags:

python

hadoop

I am new to hadoop streaming. I have few filter conditions in my reduce code, I would like to know how many records pass this conditions. I come to know we can do this by writing custom counters. Can some body show point me how to write custom counters?

I am emitting three columns in mapper code, say a,b,c key is a, and value as list, which is like [b,c], To have an example from mapper code, it is like ['I'^['C','P']]

Here is my reduce code.

labels = ["a","b"]
for line in sys.stdin:
    l = line.strip().split("^")
    key = l[0]
    value = l[1]
    record = [key] + value
    records.append(record)
df = pd.DataFrame.from_records(records,columns=labels)
df = df((df['a'] == 'I') & (df['b'] == 'C'))

I would like to know how many records df contains, at reducer level.

Thank you.

711

asked Mar 01 '17 07:03

subro

1 Answers

You can simply print to stderr:

print >> sys.stderr, "reporter:counter: CUSTOM, NbRecords,1"

This will increment counter "NbRecords" in counters group "CUSTOM" by 1

159

answered Oct 08 '22 21:10

user1151446

Related questions
                            
                                The SECRET_KEY setting must not be empty - django+pycharm
                            
                                tkinter put scrollbar on canvas at bottom position
                            
                                Sort by two columns at once
                            
                                WARNING:tensorflow - initialize_all_variables (from tensorflow.python.ops.variables) is deprecated
                            
                                Determine if a python fraction can have an equivalent decimal
                            
                                Pandas intersection of groups
                            
                                UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position XXX: char
                            
                                Scrapy code throws TypeError: 'NoneType' object is not iterable
                            
                                PyQt Multiline Text Input Box
                            
                                Python Negative Binomial Regression - Results Don't Match those from R
                            
                                Python: Find a string between two strings, repeatedly
                            
                                'Resource exhausted' memory error when trying to train a Keras model
                            
                                Should the main function and main() be placed at the start or the end of the program?
                            
                                Why use LSA before K-Means when doing text clustering
                            
                                Why is the value of a `tf.constant()` stored multiple times in memory in TensorFlow?
                            
                                Is there a way to keep Telegram bot running when closing Python? [duplicate]
                            
                                How can I get all the hops in a path of unknown length with neo4j-python?
                            
                                Add column to pandas without headers
                            
                                How to apply a persistent coordinate transformation to Matplotlib Patches?
                            
                                saving large data set PCA on disk for later use with limited disc space

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With