Bloom Index Block 在存在多个Bloom Filter位数组时候,为了提高效率使用Bloom Index Block来定位不同的位数组。Bloom Index Block内存和逻辑结构: Bloom Index Block的Bloom Index Entry的BlockOffset是一个指向Bloom Block在HFile中的偏移量。 在实现上由CompoundBloomFilterBase.java进行数位组的查找和定位: 使用一个二维数组表示多个BloomFilter的多个数位组,以及关联的Block的position:
publicstaticintbinarySearch(byte[][] arr, byte[] key, int offset, int length) { intlow=0; inthigh= arr.length - 1;
while (low <= high) { intmid= (low + high) >>> 1; // we have to compare in this order, because the comparator order // has special logic when the 'left side' is a special key. intcmp= Bytes.BYTES_RAWCOMPARATOR .compare(key, offset, length, arr[mid], 0, arr[mid].length); // key lives above the midpoint if (cmp > 0) low = mid + 1; // key lives below the midpoint elseif (cmp < 0) high = mid - 1; // BAM. how often does this really happen? else return mid; } return -(low + 1); }
所以: Get请求根据Bloom Filter进行过滤查找,可分为三步: Key 在BloomIndexBlock 所有BlckKey二分查找到,定位到Bloom Index Entity > 使用Bloom Index Entity加载对应的位数组 —> 对key进行Hash Mapping ,对数位组进行查找(! All 1 == 存在)
usage: HFile [-a] [-b] [-e] [-f <arg> | -r <arg>] [-h] [-i] [-k] [-m] [-p] [-s] [-v] [-w <arg>] -a,--checkfamily Enable family check -b,--printblocks Print block index meta data -e,--printkey Print keys -f,--file <arg> File to scan. Pass full-path; e.g. hdfs://a:9000/hbase/hbase:meta/12/34 -h,--printblockheaders Print block headers for each block. -i,--checkMobIntegrity Print all cells whose mob files are missing -k,--checkrow Enable row order check; looks for out-of-order keys -m,--printmeta Print meta data of file -p,--printkv Print key/value pairs -r,--region <arg> Region to scan. Pass region name; e.g. 'hbase:meta,,1' -s,--stats Print statistics -v,--verbose Verbose output; emits file and meta data delimiters -w,--seekToRow <arg> Seek to this row and print all the kvs for this row only