[Python] Search value in list of dictionary benchmark

This is a test how fast we can search value in a very large list of dict by various way

Test code:

# test python search performance in a large list of dict 
# ref:
#   - https://stackoverflow.com/questions/8653516/python-list-of-dictionaries-search
#   - https://stackoverflow.com/a/72721642/2533787

from random import randint
import timeit
from ducks import Dex

duct_index = None

def create_random_data(length=1000):
    data = []
    for i in range(1, length+1):
        data.append({
            "index": i,
            "value": randint(1000, 999999999),
            "value2": randint(1000, 999999999),
            "value3": randint(1000, 999999999),
            "value4": randint(1000, 999999999),
        })
    return data

def search_basic(search_value: int, search_list: list):
    # normal iteration over all elements
    for item in search_list:
        if item["value"] == search_value:
            return True
    return False

def search_generator(search_value: int, search_list: list):
    # use generator
    for dict_ in (x for x in search_list if x["value"] == search_value):
        return True
    return False

def search_list(search_value: int, search_list: list):
    # use list
    for dict_ in [x for x in search_list if x["value"] == search_value]:
        return True
    return False

def search_filter(search_value: int, search_list: list):
    # use filter
    for dict_ in filter(lambda x: x["value"] == search_value, search_list):
        return True
    return False

def search_next(search_value: int, search_list: list):
    return next((item for item in search_list if item["value"] == search_value), None) is not None

def search_duck(search_value: int, search_list: list):
    # required install external library 
    global duct_index
    if not duct_index:
        # build index
        duct_index = Dex(search_list, {"value": int})
    return len(duct_index[{"value": search_value}]) > 0

if __name__ == "__main__":
    data_length = 10000000
    print("=" * 80)
    print(f"- Data length: {data_length}")
    t0 = timeit.default_timer()
    search_data = create_random_data(data_length)
    print("- Generate data done..., time diff: ", timeit.default_timer() - t0)
    to_search = search_data[randint(0, data_length)]["value"]
    print(f"- Search value: {to_search}")

    t0 = timeit.default_timer()
    r1 = search_basic(to_search, search_data)
    t1 = timeit.default_timer()
    print(f"- search_basic() done, result: {r1}, time diff: ", t1 - t0)

    r2 = search_generator(to_search, search_data)
    t2 = timeit.default_timer()
    print(f"- search_generator() done, result: {r2}, time diff: ", t2 - t1)

    r3 = search_list(to_search, search_data)
    t3 = timeit.default_timer()
    print(f"- search_list() done, result: {r3}, time diff: ", t3 - t2)

    r4 = search_filter(to_search, search_data)
    t4 = timeit.default_timer()
    print(f"- search_filter() done, result: {r4}, time diff: ", t4 - t3)

    r5 = search_next(to_search, search_data)
    t5 = timeit.default_timer()
    print(f"- search_next() done, result: {r5}, time diff: ", t5 - t4)

    r6 = search_duck(to_search, search_data)    # first time will required create index
    t6 = timeit.default_timer()
    print(f"- search_duck() 1st done, result: {r6}, time diff: ", t6 - t5)

    r7 = search_duck(to_search, search_data)    # very fast after index is done
    t7 = timeit.default_timer()
    print(f"- search_duck() 2nd done, result: {r7}, time diff: ", t7 - t6)

Test result (Windows 10, CPU AMD 5600G, Python 3.9.7):

Output 1:

- Data length: 10000000
- Generate data done..., time diff:  17.4976254
- Search value: 68645981
- search_basic() done, result: True, time diff:  0.25917069999999853
- search_generator() done, result: True, time diff:  0.25991790000000137
- search_list() done, result: True, time diff:  0.3885101999999989
- search_filter() done, result: True, time diff:  0.4272285999999994
- search_next() done, result: True, time diff:  0.2582653000000015
- search_duck() 1st done, result: True, time diff:  42.6337855
- search_duck() 2nd done, result: True, time diff:  0.0004143999999968173

Output 2:

- Data length: 10000000
- Generate data done..., time diff:  18.0281099
- Search value: 670718026
- search_basic() done, result: True, time diff:  0.1303572000000024
- search_generator() done, result: True, time diff:  0.12219189999999713
- search_list() done, result: True, time diff:  0.3952600000000004
- search_filter() done, result: True, time diff:  0.20178010000000057
- search_next() done, result: True, time diff:  0.1236163000000019
- search_duck() 1st done, result: True, time diff:  43.3433916
- search_duck() 2nd done, result: True, time diff:  0.0004522999999991839

Conclusion:

Tags:
#python