When we started with fleet management at Visual Units, one thing was really hard to get right – distance calculations. There was no end of information available, but most-to-all of it was on a level of mathematics far beyond a poor developer who feels that anything beyond discrete mathematics and basic geometry and statistics really should be somebody else’s problem. The implementations that could be found were closed-source licensed version we really could not afford at that stage.
For a while we got by using a solution that relied on having a variant of Lambert conformal conic projection coordinates – it was sufficiently exact if not perfect, and our maps used the same projection, so it worked – although there was the added burden of transforming our stored (WGS-84) coordinates to Lambert every time we needed calculations done. A couple of years ago, however, we switched to Google Maps API and so we really had no use for Lambert – and increased load and precision demands made using the current solution a worse and worse choice.
Enter Chris Veness\_. Or rather, enter his implementation of the Vincenty inverse formula (pdf). Even though the math is beyond me, porting the Javascript implementation to Python was straightforward, and some testing showed that the result was both faster and had better precision than the previous solution.
Fast-forward to a few months ago, suddenly the performance is starting to look like something that could become a problem. We have many reasons for doing distance calculations, and while the batch jobs were not a problem, any amount of time that can be shaved off user-initiated actions is welcome.
So, I thought to myself, I’ve ported it once, how hard can it be to do it again? After all, when raw speed becomes the issue, the Python programmer reaches for C. Porting it was once again straightforward, mapping the Python function
def distance(x1, y1, x2, y2): ...
into
Listing 1.
const double distance(const double x1, const double y1, const double x2, const double y2) { ...
The resulting C code is almost identical to he Python (and Javascript) implementations but runs about 6 times faster than the Python implementation. Allowing batch submission of calculations instead of calling once for every calculation, eliminating some FFI overhead, would increase the speed further.
Listing 1.
$ python2.7 -m `test`s.`test`_distance Time elapsed for 100000 calculations in Python: 1952.70 C: 300.46 Factor: 6.50
Wrapping the C and calling it was simple enough using ctypes, and I’ve added fallback to the Python implementation if the C shared library cannot be found; a small \_\_init\_\_.py in the package hooks up the correct version:
from .distance import distance as `_py`_distance try: from `c`types import `c`dll, `c`_double dll = cdll.LoadLibrary('cDistance.so') dll.distan`c`e.restype = `c`_double dll.distan````c````e.argtypes = [````c````_double, ````c````_double, ````c````_double, ````c````_double] distance = dll.distance `_c`_distance = dll.distance except OSError: #Fall back to Python implementation distance = `_py`_distance
Of course, this depends on the C code being compiled into cDistance.so and that file being available for linking – and it keeps the .so hardcoded so a windows DLL wont work. I really did intend to clean it up more before making it open source, but since I’ve been meaning to start open sourcing some of our tools for years now and never really found the time, I thought it would be better to thow it out there, and postpone making it pretty instead. I hope someone can find some use in this, and I’ll try to get it cleaned upp and packaged Real Soon Now.
0 kommentarer