map

哈希表的主要点是哈希函数和哈希冲突解决:好的哈希函数能够尽量把哈希结果分散,但是输入数量远远大于存储数量的时候,哈希冲突还是会发生。
一般哈希冲突的解决方式有开放寻址法(有冲突了遍历找下个,最糟糕的情况下查找复杂度退化为O(n))和拉链法(有冲突挂在链表末尾),
golang使用的是拉链法。

底层结构

主要是hmap,buckets是正常元素的一个数组,oldbuckets扩容时候用,extra.nextOverflow桶溢出的时候用。

// A header for a Go map.
type hmap struct {
    // Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.
    // Make sure this stays in sync with the compiler's definition.
    count     int // # live cells == size of map.  Must be first (used by len() builtin)
    flags     uint8
    B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
    noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
    hash0     uint32 // hash seed

    buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
    oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
    nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

    extra *mapextra // optional fields
}

// mapextra holds fields that are not present on all maps.
type mapextra struct {
    // If both key and elem do not contain pointers and are inline, then we mark bucket
    // type as containing no pointers. This avoids scanning such maps.
    // However, bmap.overflow is a pointer. In order to keep overflow buckets
    // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
    // overflow and oldoverflow are only used if key and elem do not contain pointers.
    // overflow contains overflow buckets for hmap.buckets.
    // oldoverflow contains overflow buckets for hmap.oldbuckets.
    // The indirection allows to store a pointer to the slice in hiter.
    overflow    *[]*bmap
    oldoverflow *[]*bmap

    // nextOverflow holds a pointer to a free overflow bucket.
    nextOverflow *bmap
}

// A bucket for a Go map.
type bmap struct {
    // tophash generally contains the top byte of the hash value
    // for each key in this bucket. If tophash[0] < minTopHash,
    // tophash[0] is a bucket evacuation state instead.
    tophash [bucketCnt]uint8
    // Followed by bucketCnt keys and then bucketCnt elems.
    // NOTE: packing all the keys together and then all the elems together makes the
    // code a bit more complicated than alternating key/elem/key/elem/... but it allows
    // us to eliminate padding which would be needed for, e.g., map[int64]int8.
    // Followed by an overflow pointer.
}

map的查找和写入其实流程都差不多:

  1. 通过哈希函数和种子算出哈希值
  2. 算出所在的桶和高八位hash
  3. 遍历正常桶和溢出桶,找到桶,遍历高位hash列表查找
    1. 对于查找,如果高8位哈希匹配,计算实际的key的内存位置并读取,如果相等则返回值,不相等继续查找
    2. 对于写入,如果高8为哈希匹配,计算实际的key的内存位置并读取,如果相等则计算val的内存位置并覆写,否则新写入

写入的时候还有个非常重要的概念是扩容,一般是两种情况

  1. 超过装载因子;
  2. 哈希使用了太多溢出桶;

扩容的过程如下:

  1. 初始化新桶:调用hashGrow,判断是需要等大扩容(溢出桶太多)还是正常扩容(装载因子超过),新建桶,把原先的放到old字段
  2. 渐进式扩容:在对map进行写入和删除的时候调用growWork

常用情景

package main

import (
    "fmt"
)

func main() {
    fmt.Println("Hello, playground")
    nums := []int{1, 1, 2, 2, 2, 3, 4, 5}
    uniqMap := make(map[int]struct{}, len(nums))
    for _, n := range nums {
        uniqMap[n] = struct{}{}
    }
    var uniqNums []int
    for k := range uniqMap {
        uniqNums = append(uniqNums, k)
    }
    fmt.Println(uniqNums)
}

注意事项

原生的golang自带的map不是并发安全的,可以使用sync.map或者配合这sync.Mutex来用。

参考