Python Sets
Working with Unique Items: Understanding Sets in Python
Imagine you have a list of items, and some items are repeated many times. You only care about having a list of the unique items, with no duplicates. Or maybe you have two lists of things, and you want to quickly find out which items are present in both lists, or which are only in one but not the other. Doing these tasks with lists can involve a lot of loops and checks to avoid duplicates or find common items.
This is where Sets are incredibly useful. Sets are designed specifically for collections of unique items and performing mathematical set operations.
What is a Set?
A Set is a collection of items that are unordered and contain no duplicate members.
Here are the main characteristics of sets:
- Unordered: Items in a set do not have a specific position or index. The order might change, and you cannot access items using
[]
with a number. - Unique Items: A set automatically removes any duplicate values. Each item in a set must be unique.
- Changeable (Mutable): You can add new items or remove existing items after the set is created.
- Accessed by Value: You check if an item is in a set, but you cannot get an item by its position or key.
- Items Must Be Immutable: Only immutable objects (like numbers, strings, and tuples) can be put inside a set. Mutable objects (like lists or dictionaries) cannot be set items.
Sets are written using curly braces {}
with the items inside, separated by commas.
# A set of unique numbers
unique_numbers = {1, 2, 3, 4}
# A set created with duplicates - duplicates are automatically removed
numbers_with_dupes = {1, 2, 2, 3, 1, 4}
print(numbers_with_dupes) # Output might be {1, 2, 3, 4} (order is not guaranteed)
# A set of strings (unique tags)
game_tags = {"adventure", "fantasy", "RPG", "adventure"}
print(game_tags) # Output might be {'fantasy', 'RPG', 'adventure'} (order is not guaranteed)
Note: While sets use {}
like dictionaries, an empty {}
creates an empty dictionary, not a set.
Creating Sets
There's a special way to create an empty set.
Creating an Empty Set
To create an empty set, you must use the set()
function. Using {}
creates an empty dictionary.
empty_set = set() # This creates an empty set
print(type(empty_set)) # Output: <class 'set'>
empty_dict = {} # This creates an empty dictionary
print(type(empty_dict)) # Output: <class 'dict'>
Creating a Set with Initial Items
You can put the items inside curly braces {}
(unless creating an empty set). If you have duplicates, they will be ignored.
players_online = {"Alice", "Bob", "Charlie"}
distinct_ids = {101, 105, 101, 203} # Duplicates are removed
print(distinct_ids) # Output might be {101, 105, 203}
You can also create a set by converting another iterable (like a list, tuple, or string) using the set()
function. This is a common way to quickly get the unique items from a collection.
# Get unique items from a list
my_list = [1, 2, 2, 3, 1, 4]
unique_items = set(my_list)
print(unique_items) # Output might be {1, 2, 3, 4}
# Get unique characters from a string
my_string = "programming"
unique_chars = set(my_string)
print(unique_chars) # Output might be {'p', 'r', 'o', 'g', 'a', 'm', 'i', 'n'}
Accessing Set Items (Not Possible by Index or Key)
Because sets are unordered, you cannot access items using an index like my_set[0]
or using a key like my_set["item"]
. The items don't have fixed positions.
You can only interact with set items by checking if a specific value is in the set, or by looping through all the items.
Adding Items to a Set (.add()
)
You can add a single item to a set using the .add()
method. If the item is already in the set, nothing happens (sets only store unique items).
game_tags = {"adventure", "fantasy"}
print(f"Set before adding: {game_tags}") # Output: Set before adding: {'adventure', 'fantasy'}
game_tags.add("RPG") # Add a new item
print(f"Set after adding RPG: {game_tags}") # Output might be {'adventure', 'fantasy', 'RPG'}
game_tags.add("fantasy") # Try adding a duplicate
print(f"Set after adding duplicate: {game_tags}") # Output might be {'adventure', 'fantasy', 'RPG'} (No change)
Removing Items from a Set
You can remove items from a set.
Remove using .remove()
or .discard()
.remove(item)
: Removes the specified item. If the item is not found, it causes aKeyError
..discard(item)
: Removes the specified item. If the item is not found, it does nothing and causes no error..discard()
is generally safer if you're not sure if the item is present.
players_online = {"Alice", "Bob", "Charlie"}
print(f"Set before removing: {players_online}") # Output: Set before removing: {'Alice', 'Bob', 'Charlie'}
players_online.remove("Bob") # Remove "Bob"
print(f"Set after removing Bob: {players_online}") # Output might be {'Alice', 'Charlie'}
players_online.discard("David") # Try to remove "David" (not in set) - no error
print(f"Set after discarding David: {players_online}") # Output might be {'Alice', 'Charlie'}
# players_online.remove("David") # This would cause a KeyError!
Remove using .pop()
The .pop()
method removes and returns an arbitrary item from the set. Since sets are unordered, you don't know which item will be removed. This is useful if you just need to get and remove an item, without caring which one.
my_set = {10, 20, 30}
print(f"Set before pop: {my_set}") # Output: Set before pop: {10, 20, 30}
arbitrary_item = my_set.pop() # Removes an item and returns it
print(f"Set after pop: {my_set}") # Output might be {20, 30} (depending on which item was popped)
print(f"Popped item: {arbitrary_item}") # Output might be 10 (or 20, or 30)
Calling .pop()
on an empty set causes a KeyError
.
Remove all items (.clear()
)
The .clear()
method removes all items from the set, making it empty.
my_set = {1, 2, 3}
my_set.clear()
print(my_set) # Output: set()
Set Operations (Mathematical Set Theory)
Sets are powerful for performing common mathematical set operations.
-
Union: Combines items from two sets into a new set, including all unique items present in either set. Use the
|
operator or the.union()
method.set1 = {1, 2, 3} set2 = {3, 4, 5} union_set = set1 | set2 # Using operator # union_set = set1.union(set2) # Using method - same result print(union_set) # Output: {1, 2, 3, 4, 5}
-
Intersection: Creates a new set containing only the items that are present in both sets. Use the
&
operator or the.intersection()
method.set1 = {1, 2, 3} set2 = {3, 4, 5} intersection_set = set1 & set2 # Using operator # intersection_set = set1.intersection(set2) # Using method - same result print(intersection_set) # Output: {3}
-
Difference: Creates a new set containing only the items that are in the first set but not in the second set. Use the
-
operator or the.difference()
method.set1 = {1, 2, 3} set2 = {3, 4, 5} difference_set = set1 - set2 # Using operator # difference_set = set1.difference(set2) # Using method - same result print(difference_set) # Output: {1, 2} (Items in set1 but not in set2) difference_set_reverse = set2 - set1 # Order matters for difference! print(difference_set_reverse) # Output: {4, 5} (Items in set2 but not in set1)
-
Symmetric Difference: Creates a new set containing items that are in either set, but not in both (items unique to each set). Use the
^
operator or the.symmetric_difference()
method.set1 = {1, 2, 3} set2 = {3, 4, 5} sym_diff_set = set1 ^ set2 # Using operator # sym_diff_set = set1.symmetric_difference(set2) # Using method - same result print(sym_diff_set) # Output: {1, 2, 4, 5}
You can also check relationships between sets:
- Subset: Check if all items in the first set are also in the second set. Use
<=
operator or.issubset()
. - Superset: Check if all items in the second set are also in the first set. Use
>=
operator or.issuperset()
. - Disjoint: Check if the two sets have absolutely no items in common. Use
.isdisjoint()
.
set_a = {1, 2}
set_b = {1, 2, 3, 4}
set_c = {5, 6}
print(set_a <= set_b) # Output: True (set_a is a subset of set_b)
print(set_b >= set_a) # Output: True (set_b is a superset of set_a)
print(set_a.isdisjoint(set_c)) # Output: True (set_a and set_c have no common items)
Checking if an Item Exists (in
Operator)
Checking if a specific value is in a set is very fast and efficient. Use the in
operator.
players_online = {"Alice", "Bob", "Charlie"}
print("Bob" in players_online) # Output: True
print("David" in players_online) # Output: False
Getting the Number of Items (len()
)
Use the built-in len()
function to find out how many unique items are in a set.
my_set = {10, 20, 30}
print(len(my_set)) # Output: 3
empty_set = set()
print(len(empty_set)) # Output: 0
Iterating Through a Set (for
Loop)
You can go through each item in a set using a for
loop. However, since sets are unordered, the order in which the items are processed is not guaranteed (it might be insertion order in recent Python versions, but you should write your code as if the order is random).
game_tags = {"adventure", "fantasy", "RPG"}
print("Game tags:")
for tag in game_tags:
print(tag)
# Output: Will print each tag, but the order might be different each time or across systems.
Items Must Be Immutable
An important rule for sets is that every item you put into a set must be an immutable object. This is because sets need to quickly determine if two items are the same or different, and they do this using a technique called hashing. Hashing requires that an object's value (and its "hash value") does not change. Immutable types like numbers, strings, and tuples have fixed hash values. Mutable types like lists and dictionaries can change, so they don't have a consistent hash value and cannot be set members.
my_set = {1, "hello", (1, 2)} # Numbers, strings, and tuples are immutable - OK
# my_set = {1, [1, 2]} # This would cause a TypeError!
# TypeError: unhashable type: 'list'
# my_set = {"a": 1, "b": 2} # This is a dictionary, which is mutable - Cannot be a set member
# my_set = {{'a': 1}} # This would cause a TypeError!
When to Use Sets
Use sets when:
- You need to store a collection of items where each item must be unique, and you want duplicates to be automatically handled.
- The order of items does not matter.
- You need to quickly check if an item exists in the collection (membership testing is very fast in sets).
- You need to perform mathematical set operations like union, intersection, difference, etc.
- You need to use a collection as a dictionary key or a set member (if the collection itself is immutable, like a tuple).
If order is important, or you need key-value pairs, use lists, tuples, or dictionaries instead.
Type Annotations for Sets
For type hinting, you use the Set
type from the typing
module. You specify the type expected for the items inside square brackets []
.
Syntax: Set[ItemType]
from typing import Set
# A set expected to contain only strings
game_tags: Set[str] = {"adventure", "fantasy"}
# A set expected to contain only integers
unique_ids: Set[int] = {101, 105}
# A set expected to contain immutable tuples
coordinates_set: Set[Tuple[int, int]] = {(1, 1), (2, 2)}
This type annotation (Set[str]
) helps document your code and allows type checkers to verify the expected type of items when you add to or remove from the set.
Going Deeper: Sets and Abstract Base Classes
Sets implement standard interfaces defined by Abstract Base Classes (ABCs) in collections.abc
, specifically Container
, Iterable
, Sized
, Set
, and MutableSet
.
Container
: Defines support for thein
operator (checking membership).Iterable
: Defines support for iteration (usingfor
loops).Sized
: Defines support for thelen()
function.Set
: This ABC defines the core requirements for any object that acts like a read-only mathematical set (unordered, unique items, support for basic set operations like|
,&
,-
,^
,<=
,>=
,.issubset()
,.issuperset()
,.isdisjoint()
). Types likeset
andfrozenset
(an immutable version of set) implementSet
.MutableSet
: This ABC is for sets that can be changed after creation (they are mutable). AMutableSet
supports everything aSet
does, plus methods for changing the set in place. TheMutableSet
ABC adds requirements for things like:- Adding items (
.add()
). - Removing items (
.remove()
,.discard()
,.pop()
). - Updating with other sets (
.update()
,.intersection_update()
, etc.). - Clearing (
.clear()
).
- Adding items (
A Python set
is a Container
, Iterable
, Sized
, Set
, and MutableSet
. It fulfills the contracts defined by all these blueprints, providing the full set of mutable set behaviors.
You can check this using isinstance()
:
import collections.abc
my_set = {1, 2, 3}
print(isinstance(my_set, collections.abc.Set)) # Output: True (A set is a mathematical set)
print(isinstance(my_set, collections.abc.MutableSet)) # Output: True (A set can be changed)
print(isinstance(my_set, collections.abc.Container)) # Output: True (Supports 'in')
print(isinstance(my_set, collections.abc.Sized)) # Output: True (Supports 'len()')
print(isinstance(my_set, collections.abc.Iterable)) # Output: True (Supports 'for' loop)
Understanding these ABCs helps place sets within Python's collection hierarchy and understand the general capabilities associated with set-like objects.
Conclusion
Sets are valuable collections in Python for managing unique items and performing efficient membership tests and mathematical set operations.
Key things you learned about sets:
- They are unordered and contain only unique items.
- They are mutable (items can be added/removed).
- You create them using curly braces
{}
(but useset()
for an empty set). - Items must be immutable.
- You cannot access items by index or key.
- You add items with
.add()
and remove with.remove()
,.discard()
, or.pop()
. - They are great for set operations (
|
,&
,-
,^
,.union()
,.intersection()
, etc.). - You check for membership using the
in
operator (very fast). - You get the number of items with
len()
. - You can go through items using a
for
loop, but the order is not guaranteed. - You can add generic type hints (
Set[ItemType]
). - Sets implement the
Set
andMutableSet
ABCs.
Use sets when the uniqueness and speed of membership testing are important, and the order of items doesn't matter. They provide powerful tools for comparing and combining collections based on their contents.