Python tips: reverse a dictionary

In Python, if you want to reverse a dictionary (dict), i.e. swap the keys and values, there are three methods that can help you achieve that goal depending on your needs:
  • Method 1: makes use of dict comprehension, and the dict must contain unique values
  • Method 2: makes use of dict.get(), and the dict doesn't contain unique values
  • Method 3: makes use of map(reversed, iter), and you want the reverse dict to preserve the type and order of the original dict (if for example it is an OrderedDict)
I provided code implementations for each methods that work for Python 2.7.x and 3+, and the embedded code can be executed directly from the blog post through JDoodle, an online compiler and editor for many other languages (68 in total such as C/C++, Java, Go, SQL).

I also compiled the average running times of the different methods in a table at the end ( Spoiler alert: Click to reveal winning method)

The code used for computing the average run times of the three methods can be found on my github.

Contents

    4.1 Major updates
    4.2 Results

Method 1: values are unique, solution based on dict comprehension

In the first method, the assumption we make is that your dict must contain unique values. For the first method to work, this assumption about unicity in the values must hold.

The method makes use of dict comprehension by swapping the keys and values. Here are the code implementations:

Code (Python 2.7 & 3)

For Python 2.7.x
my_dict = { 'a': 1, 'b':2, 'c': 3, 'd':4, 'e':5}
inv_dict = {v: k for k, v in my_dict.iteritems()}
Method 1 of reversing a dict with unique values
code @ jdoodle.com (Python 2.7.15)

For the Python 2 version, use dict.iteritems() which is more efficient than dict.items().Table 1 shows the average running times for the different methods in Python 2, and you can see from looking at the first two rows that by using iteritems() for method 1 you can reduce your running time significantly (especially when lots of items are involved) compared with items(). The same kind of speed up can be seen also for the other methods when using iteritems() instead of items() in Python 2.

For Python 3.x
Method 1 of reversing a dict with non-unique values
code @ jdoodle.com (Python 3.6.5)

Drawbacks of method 1

  • Method 1 breaks if the values in the original dict are not unique: all keys but one that are associated with the duplicated values will be ignored. See Method 2 on how to solve this case with duplicated values in the original dict.
  • Method 1 won't necessarily preserve the original insertion order which is to be expected since Python dicts don't have any ordering (unlike OrderedDict). However, from Python 3.7 onwards, insertion order will be a language feature unlike in Python 3.6 which was an implementation detail, i.e. as of Python 3.7, you can be guaranteed that insertion order is supported for dicts.

References

Method 2: non-unique values

If in the original dictionary, the values are not unique, then you could still do a reverse of the dictionary by saving all the keys associated with the same value in a list. Thus, your reverse dict would store as values a list of all the keys associated with the same value in the original dict.

Example: contents of the original and inverse dicts when the values in the original dict are not unique

my_dict = {'1': a, 2: 'b', 3: 'a'}
inv_dict = {'a': [1, 3],'b': [2]}

Code (Python 2.7 & 3)

For Python 2.7.x
my_dict = {1: 'a', 2:'b', 3: 'c', 4: 'a', 5: 'c'}
inv_dict = {}
for k, v in my_dict.iteritems():
    inv_dict[v] = inv_dict.get(v, [])
    inv_dict[v].append(k)
Method 2 of reversing a dict with non-unique values
code @ jdoodle.com (Python 2.7.15)

For Python 3.x
Method 2 of reversing a dict with non-unique values
code @jdoodle.com (Python 3.6.5)

Code using setdefault (Python 2.7 & 3)

If you prefer setdefault, you can set the values in the inverse dict with one line, as can be seen in the following implementations:

For Python 2.7.x
my_dict = {1: 'a', 2:'b', 3: 'c', 4: 'a', 5: 'c'}
inv_dict = {}
for key, value in my_dict.iteritems():
    inv_dict.setdefault(value, []).append(key)
Method 2 of reversing a dict with non-unique values using dict.setdefault
code @ jdoodle.com (Python 2.7.15)

For Python 3.x
my_dict = {1: 'a', 2:'b', 3: 'c', 4: 'a', 5: 'c'}
inv_dict = {}
for key, value in my_dict.items():
    inv_dict.setdefault(value, []).append(key)
Method 2 of reversing a dict with non-unique values using dict.setdefault
code @ jdoodle.com (Python 3.6.5)

Drawbacks of method 2

  • Like Method 1, Method 2 also won't necessarily preserve the original insertion in the dict. Use OrderedDict if you need this property.
  • dict.get() and dict.setdefault() both have similar very poor average run times when working in Python 3, as can be seen from Table 2 (Python 3). In Python 2, dict.setdefault() provides better code speed up than dict.get(), as can be seen from Table 1 (Python 2). Also, dict.setdefault() allows you to use more compact code than with dict.get(). Finally as stated before, if your dicts have non-unique values, then Method 2 might be a good solution.

Method 3: type and order preserved

Method 3 makes use of map(reversed, iter) to swap the keys and values of an arbitrary mapping type (e.g. OrderedDict or defaultdict). Thus, the reverse mapping will not be necessarily forced to have dict type like it was the case in the other two methods. If you start with OrderedDict as the original collection, then your reverse collection will be of OrderedDict type.

Code (Python 2.7 & 3)

For Python 2.7.x
def reverse_mapping(f):
    return f.__class__(map(reversed, f.iteritems()))

my_dict = {1: 'a', 2:'b', 3: 'c', 4: 'd', 5: 'e'}
inv_dict = reverse_mapping(my_dict)
Method 3 of reversing a dict with unique values using map(reversed, iter)
code @ jdoodle.com (Python 2.7.15)

For Python 3.x
Method 3 of reversing a dict with unique values using map(reversed, iter)
code @ jdoodle.com (Python 3.6.5)

Drawbacks of method 3

  • Like Method 1, Method 3 also breaks if the original dict doesn't have unique values. If you need to work with dictionaries with no unicity in the values, check Method 2 which takes care of this situation.
  • Method 3 might have the advantage of preserving both the type and order of the original dictionary (e.g. if working with OrderedDict), but its average run time is the worst when used in Python 2.7.x, see Table 1.
  • The one-line code might not be easily readable compared to the other methods.

References

Average running times of the three methods

Major updates

2018-09-16:
  • I updated the code base and re-ran the python commands to re-populate the following two tables with new results. The changes consisted in implementing the important missing option --use_setdefault which I thought was already implemented when I generated the results the first time. Big mistake on my part! Thus the old results for the row Method 2: setdefault were generated actually with dict.get() instead of dict.setdefault() :(
  • Also, I factorized methods.py where the different dict-reversing methods are defined by putting all the common code from Python 2 & 3 methods into base classes.

Boring stuff

The whole code for computing the average run times of each method can be found on my github where you will also find details on how to install the program and run the main script to compute the average run times of different methods.

The original dictionaries with unique values are initialized using this template:
{'ki': 'vi'} where i is any number starting from 1
e.g. {'k1': 'v1', 'k2':''v2'}.

The other dictionaries with non-unique values are initialized with the same kind of template but with the difference that each value in the original dictionary is associated with exactly two keys.

Results

Table 1 presents the average run times (µsec) of the three methods for different number of items with Python 2.7. The average time when using 1000 items was based on running the methods 100k times. For the other cases (10k and 100k items), the average run time was computed by running the code 1k times for each method. Table 2 presents the same kind of results but for methods working in Python 3.

From Table 1 and Table 2, we conclude that:
  • Method 1 offers the best average run times for both Python 2.7 and Python 3.
  • Method 2 offers very bad average run times for both Python 2.7 and Python 3.
  • Method 3 presents the second best average run time with Python 3, but when working with Python 2, Method 3 doesn't present the same kind of performance as seen with Python 3.
  • Python 2 recommendation: it is highly recommended to use dict.iteritems() instead of dict.items(), as it can be see from Table 1 where method 1 was tested with dict.items() (the row with the red highlight).

Table 1 Average running times (µsec) of different methods
of reversing a dict in Python 2.7
Click on the method name to know more about its implementation
Py2 MethodAvg time (µsec), 1k items, 100k timesAvg time (µsec), 10k items, 1k timesAvg time (µsec), 100k items, 1k times
Method 1: dict compre.,iteritems217.962665.5645301.06
Method 1: dict compre.,items256.794125.6070901.29
Method 2: dict.get,iteritems838.2710410.03127703
Method 2: setdefault,iteritems656.038539.27108347.76
Method 3: map,iteritems895.9510788.73146115.25


Table 2 Average running times (µsec) of different methods
of reversing a dict in Python 3
Click on the method name to know more about its implementation
Py3 MethodAvg time (µsec), 1k items, 100k timesAvg time (µsec), 10k items, 1k timesAvg time (µsec), 100k items, 1k times
Method 1: dict compre.89.97905.4618487.63
Method 2: dict.get417.174949.2266599.29
Method 2: setdefault333.024171.3158628.69
Method 3: map311.593366.2046960.80

Summary of results

Thus, the results can be summarized as follows:
  • if your dict has unique values, use dict comprehension (see Method 1)
  • if your dict doesn't have unique values, use dict.setdefault() when populating the dict (see Method 2) as it provides a more compact form than using dict.get()
  • no matter what method you use, if you are working in Python 2.7, use dict.iteritems() instead of dict.items(). The reason is that since dict.iteritems() makes use of generators, it is more memory efficient.

Comments

Popular posts from this blog

Deactivate conda's base environment on startup

Draw arrows with GIMP plugins

Product review: SMONET wireless security camera system