After Beazley’s talk at PyCon "Understanding the Python GIL" I released I had never done any work that released the GIL, spawned threads, did some work, and then restored the GIL. So I wanted to see if I could so something like that with Boost::Python and Boost::Thread and the type of performance I’d get from it with an empty while loop as the baseline. So I hacked up some quick and dirty C++ code and quick bit of runable Python to test out the resulting module and away I went. Below are the code snippets, links to bitbucket, and the results of the Python runable.

#include <iostream>
#include <vector>
#include <boost/shared_ptr.hpp>
#include <boost/thread.hpp>
#include <boost/python.hpp>

class ScopedGILRelease {
public:
	inline ScopedGILRelease() { m_thread_state = PyEval_SaveThread(); }
	inline ~ScopedGILRelease() {
        PyEval_RestoreThread(m_thread_state);
        m_thread_state = NULL;
    }
private:
	PyThreadState* m_thread_state;
};

void loop(long count)
{
	while (count != 0) {
		count -= 1;
	}
	return;
}

void nogil(int threads, long count)
{
	if (threads <= 0 || count <= 0)
		return;
	
	ScopedGILRelease release_gil = ScopedGILRelease();
	long thread_count = (long)ceil(count / threads);
	
	std::vector<boost::shared_ptr<boost::thread> > v_threads;
	for (int i=0; i != threads; i++) {
		boost::shared_ptr<boost::thread>
		m_thread = boost::shared_ptr<boost::thread>(
		    new boost::thread(
		        boost::bind(loop,thread_count)
		    )
		);
		v_threads.push_back(m_thread);
	}
	
	for (int i=0; i != v_threads.size(); i++)
		v_threads[i]->join();
	
	return;
}

BOOST_PYTHON_MODULE(nogil)
{
	using namespace boost::python;
	def("nogil", nogil);
}

Then I used the following Python script to run some quick tests.

import time
import nogil

def timer(func):
	def wrapper(*arg):
		t1 = time.time()
		func(*arg)
		t2 = time.time()
		print "%s took %0.3f ms" % (func.func_name, (t2-t1)*1000.0)
	return wrapper

@timer
def loopone():
	count = 5000000
	while count != 0:
		count -= 1

@timer
def looptwo():
	count = 5000000
	nogil.nogil(1,count)

@timer
def loopthree():
	count = 5000000
	nogil.nogil(2,count)

@timer
def loopfour():
	count = 5000000
	nogil.nogil(4,count)
	
@timer
def loopfive():
	count = 5000000
	nogil.nogil(6,count)
		
def main():
	loopone()
	looptwo()
	loopthree()
	loopfour()
	loopfive()
	
if __name__ == '__main__':
	main()

The results I got were quite interesting and very consistent on my MacBook Pro. I ran the script about 1,000 times and got roughly the same results every time.

loopone took 364.159 ms (pure python)
looptwo took 15.295 ms (c++, no GIL, single thread)
loopthree took 7.763 ms (c++, no GIL, two threads)
loopfour took 8.119 ms (c++, no GIL, four threads)
loopfive took 11.102 ms (c++, no GIL, six threads)

Anyway, that’s all really. Nothing profound here, no super insightful ending. Just hey look and stuff is faster and I might use this. All the code for this is available in my bitbucket repo. http://bitbucket.org/wwitzel3/code/src/tip/nogil/

You will require Boost Library including Boost Python and Boost Thread as well as Python libraries and includes to build this. For boost, bjam –with-python –with-thread variant=release toolset=gcc is all I did on my Mac. Then I added the resulting lib’s as Framework dependencies in Xcode along with the Python.framework.