Memory management means allocating and de-allocating memory resources for your data in a computer program. It is essential to software development because it affects your code or program's overall performance and efficiency.
In this article, you will learn Python's inner workings of memory management. You will understand concepts like Python memory manager, garbage collection, and reference counting. Whether you're a beginner in Python or an experienced developer, this article will provide a comprehensive overview of memory management in Python and help you make better decisions to optimize your code.
Importance Of Memory Management
In programming, memory management allows your codes or programs to run effectively. Proper memory management will prevent your code from crashing or having memory leaks. Memory management is helpful for the following reasons:
Allocating memory for newly created objects.
De-allocating memory for objects that have been used. Once your program gets executed, the memory used will be de-allocated.
Simple mistakes like forgetting to de-allocate memory or which memory is currently in use can cause your program to lag and have serious performance issues. This is because the memory will be too full to run at top speed.
The Python Approach to Memory Management
Early programming languages like C and C++ required developers to manage memory by manually allocating and de-allocating memory when coding. This method is inefficient because sometimes, developers can unconsciously skip one of the processes and have problems with their program.
In Python, memory management is handled automatically by the Python memory manager. Similarly to other languages, the Python memory manager uses stack and heap memory.
Stack Memory: Stack memory stores temporary data, function calls, and references to objects stored in the heap memory. Read more about stack memory here.
Heap Memory: Heap memory stores objects and data that need to be in memory longer than stack memory. This article speaks more about heap memory.
This image gives a basic overview of what each memory stores in Python.
In Python, whenever a variable is created, the Python memory manager will check if there is an object with that same value in memory. If there is, the newly created variable will point to the existing object in memory instead of creating an entirely new object. For instance, consider the code snippet below:
age = 20
score = 20
In the program above, you'll expect both variables to have unique memory spaces because they serve different purposes. The Python memory manager will, however, not do this. Since both variables have the same value, the memory manager will create one object representing both references. This image gives a clear view:
To confirm this, make use of the id()
function in Python like so:
From the code above, you can see that both variables have the same ID. It confirms the fact that they both reference the same object in memory. If another variable with the same value is created, it will reference the same object in memory. This approach is better than creating a new object in memory for each variable.
There are some things to note about this approach:
If one of the variables gets reassigned, it is moved to a different memory location. However, if its new value already has an object in memory, it is moved to that memory address.
Mutable data types such as lists are assigned different objects even if they contain the same items. This is because changes to one of such lists will affect the other list(s) if they are in the same memory location.
Garbage Collection in Python
Garbage collection is when objects not in use are removed from the memory periodically. The garbage collector automatically does garbage collection. The two ways to implement garbage collection in Python are:
Reference counting
Generational garbage collection
Reference Counting in Python
Reference counting is an approach in memory management that keeps track of the number of times an object is referenced in memory. You reference an object whenever you assign a variable. Whenever you reference an object, the reference count increases by 1. This example will shed more light:
x = "This is my house!"
y = "This is my house!"
z = x
Since variables x
, y,
and z
refer to the same values, they have the same memory location. However, the reference count of the variable x
increases with every new assignment. You can get an object's reference count by using the sys.getrefcount()
function available in the sys
module. You can verify the reference count of the above code snippet below:
Some of the things you should note are:
The
sys.getrefcount()
function adds an extra reference to the count. This means if the initial reference of an object is 1,sys.getrefcount()
will return 2.If one of the variables is reassigned, the reference count will decrease by 1.
When the reference count reaches 0, the object is deallocated from memory.
You should read this article and this article for more information on reference counting.
Generational Garbage Collection
Generational garbage collection was a feature added in Python 2.0. Before this, Python used only reference counting to manage memory, but it needed a more efficient method to solve the issue of reference cycles.
When two objects in memory hold references to each other, it is called a reference cycle. If this happens, the reference count of the objects will not reach 0, and the memory not be free. The following is an example of how an object can get stuck in a reference cycle:
superheroes = ["Captain America","Superman","Batman"]
sidekicks = ["Bucky", "Jimmy Oslen", "Robin"]
superheroes.append(sidekicks)
sidekicks.append(superheroes)
del superheroes
print(sidekicks[-1])
In the code above, even after the variable, superheroes
is deleted, it still has a reference in the memory. You can confirm this when you print the last element of the variable, sidekicks
. The same thing will happen if sidekicks
was deleted or if both variables get deleted. You can run the above code here:
The garbage collector is used to fix this issue. The garbage collector is a mechanism that detects reference cycles in Python and removes them.
The garbage collector cannot run always run because of the following reasons:
Nothing else in the program can run whenever the garbage collector is running until it is done. This behavior can make your code slow.
The garbage collector usually has no work to do because reference cycles are mostly observed in large projects only.
To use the garbage collector, you need to import it like this:
import gc
The garbage collector classifies Python objects into three categories called generations. Each of these generations has an object threshold count. The threshold count for each generation can be seen by using the gc.get_threshold()
command.
import gc
print(gc.get_threshold())
After running this command, you will get three values. Each of these values represents the threshold count for each generation. In the image below, the first generation has a threshold of 700.
All objects start their lives in the first generation. The garbage collector is activated whenever the number of objects in any generation exceeds its threshold. This is the only time the garbage collector runs automatically.
If objects in a particular generation are not cleaned up because they still have references, they are pushed to the next generation. To manually activate the garbage collector, use the gc,collect()
function.
import gc
gc.collect()
Conclusion.
You have learned about how Python handles memory management. Although Python automatically handles memory management, the garbage collector can slow down your program if you have a large script and many objects are being created.
To prevent this from happening, you should learn to optimize your code and manually call the garbage collector at intervals. You can learn more about memory management in Python with these links:
Heap memory - https://www.geeksforgeeks.org/what-is-a-memory-heap/
Stack memory - https://www.sciencedirect.com/topics/engineering/stack-memory
Garbage collector - https://docs.python.org/3/library/gc.html
Reference counting - https://betterprogramming.pub/a-guide-to-reference-counting-in-python-27334fc2e3c1
Reference counting - https://towardsdatascience.com/understanding-reference-counting-in-python-3894b71b5611
For more content like this, follow me on Hashnode and Twitter.