哈希表的主要点是哈希函数和哈希冲突解决:好的哈希函数能够尽量把哈希结果分散,但是输入数量远远大于存储数量的时候,哈希冲突还是会发生。
一般哈希冲突的解决方式有开放寻址法(有冲突了遍历找下个,最糟糕的情况下查找复杂度退化为O(n))和拉链法(有冲突挂在链表末尾),
golang使用的是拉链法。
底层结构
主要是hmap,buckets是正常元素的一个数组,oldbuckets扩容时候用,extra.nextOverflow桶溢出的时候用。
// A header for a Go map. type hmap struct { // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go. // Make sure this stays in sync with the compiler's definition. count int // # live cells == size of map. Must be first (used by len() builtin) flags uint8 B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items) noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details hash0 uint32 // hash seed buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0. oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated) extra *mapextra // optional fields } // mapextra holds fields that are not present on all maps. type mapextra struct { // If both key and elem do not contain pointers and are inline, then we mark bucket // type as containing no pointers. This avoids scanning such maps. // However, bmap.overflow is a pointer. In order to keep overflow buckets // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow. // overflow and oldoverflow are only used if key and elem do not contain pointers. // overflow contains overflow buckets for hmap.buckets. // oldoverflow contains overflow buckets for hmap.oldbuckets. // The indirection allows to store a pointer to the slice in hiter. overflow *[]*bmap oldoverflow *[]*bmap // nextOverflow holds a pointer to a free overflow bucket. nextOverflow *bmap } // A bucket for a Go map. type bmap struct { // tophash generally contains the top byte of the hash value // for each key in this bucket. If tophash[0] < minTopHash, // tophash[0] is a bucket evacuation state instead. tophash [bucketCnt]uint8 // Followed by bucketCnt keys and then bucketCnt elems. // NOTE: packing all the keys together and then all the elems together makes the // code a bit more complicated than alternating key/elem/key/elem/... but it allows // us to eliminate padding which would be needed for, e.g., map[int64]int8. // Followed by an overflow pointer. }
map的查找和写入其实流程都差不多:
- 通过哈希函数和种子算出哈希值
- 算出所在的桶和高八位hash
- 遍历正常桶和溢出桶,找到桶,遍历高位hash列表查找
- 对于查找,如果高8位哈希匹配,计算实际的key的内存位置并读取,如果相等则返回值,不相等继续查找
- 对于写入,如果高8为哈希匹配,计算实际的key的内存位置并读取,如果相等则计算val的内存位置并覆写,否则新写入
写入的时候还有个非常重要的概念是扩容,一般是两种情况:
- 超过装载因子;
- 哈希使用了太多溢出桶;
扩容的过程如下:
常用情景
- 去重
package main import ( "fmt" ) func main() { fmt.Println("Hello, playground") nums := []int{1, 1, 2, 2, 2, 3, 4, 5} uniqMap := make(map[int]struct{}, len(nums)) for _, n := range nums { uniqMap[n] = struct{}{} } var uniqNums []int for k := range uniqMap { uniqNums = append(uniqNums, k) } fmt.Println(uniqNums) }
注意事项
原生的golang自带的map不是并发安全的,可以使用sync.map或者配合这sync.Mutex来用。