Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When it comes to mapreduce how are the Accumulo tablets mapped to an HDFS block

If my environment set up is as follows:
-64MB HDFS block
-5 tablet servers
-10 tablets of size 1GB each per tablet server

If I have a table like below:
rowA | f1 | q1 | v1
rowA | f1 | q2 | v2

rowB | f1 | q1 | v3

rowC | f1 | q1 | v4
rowC | f2 | q1 | v5
rowC | f3 | q3 | v6

From the little documentation, I know all data about rowA will go one tablet which may or may not contain data about other rows ie its all or none. So my questions are:

How are the tablets mapped to a Datanode or HDFS block? Obviously, One tablet is split into multiple HDFS blocks (8 in this case) so would they be stored on the same or different datanode(s) or does it not matter?

In the example above, would all data about RowC (or A or B) go onto the same HDFS block or different HDFS blocks?

When executing a map reduce job how many mappers would I get? (one per hdfs block? or per tablet? or per server?)

Thank you in advance for any and all suggestions.

like image 281
chapstick Avatar asked Oct 05 '22 18:10

chapstick


1 Answers

To answer your questions directly:

How are the tablets mapped to a Datanode or HDFS block? Obviously, One tablet is split into multiple HDFS blocks (8 in this case) so would they be stored on the same or different datanode(s) or does it not matter?

Tablets are stored in blocks like all other files in HDFS. You will typically see all blocks for a single file on at least one data node (this isn't always the case, but seems to mostly hold true when i've looked at block locations for larger files)

In the example above, would all data about RowC (or A or B) go onto the same HDFS block or different HDFS blocks?

Depends on the block size for your tablets (dfs.block.size or if configured the Accumulo property table.file.blocksize). If the block size is the same size as the tablet size, then obviously they will be in the same HDFS block. Otherwise if the block size is smaller than the tablet size, then it's pot luck as to whether they are in the same block or not.

When executing a map reduce job how many mappers would I get? (one per hdfs block? or per tablet? or per server?)

This depends on the ranges you give InputFormatBase.setRanges(Configuration, Collection<Ranges>).

If you scan the entire table (-inf -> +inf), then you'll get a number of mappers equal to the number of tablets (caveated by disableAutoAdjustRanges). If you define specific ranges, you'll get a different behavior depending on whether you've called InputFormatBase.disableAutoAdjustRanges(Configuration) or not:

  1. If you have called this method then you'll get one mapper per range defined. Importantly, if you have a range that starts in one tablet and ends in another, you'll get one mapper to process that entire range
  2. If you don't call this method, and you have a range that spans over tablets, then you'll get one mapper for each tablet the range covers
like image 120
Chris White Avatar answered Oct 10 '22 02:10

Chris White