Tuesday, February 10, 2015

Rusthon C++11 Backend


These are the first benchmarks testing the new C++ backend of Rusthon. As we would expect, its faster than CPython and the JavaScript backend. And, with one extra line of code, that enables raw pointers and noexcept, Rusthon can out perform PyPy.

This benchmark is short and straight to the point, less than 50 lines, it is testing object allocation/deallocation, calling methods, and adding integers. The outer loop iterates, 1000 times, and the inner loop iterates 100 times.

source code: add.py

The only surprising result from above is the big difference in performance between NodeJS and NodeWebkit. Both are using Rusthon's JavaScript backend with the same options, the JavaScript is the same. NodeJS is more than twice as slow as NodeWebkit, most likely because my version of NodeJS is older, and must not have been compiled with as many optimizations as NodeWebkit. This may show that while V8 is super fast, it's speed many vary alot depending on version and compilation options.

Optimized Version

Here is the same benchmark as above, but with one extra line at the top with (pointers, noexcept):, this makes rusthon use real pointers, and functions do not throw exceptions. Most of the performance gain comes from changing from reference counted std::shared_ptr to a real pointers. Using direct pointers requires more care in how your code is written, and how memory will be automatically and manually be free'ed, in the future we will provide an option for something faster than std::shared_ptr but safer than real pointers, using arena management or multi-core cleanup.

source code: add-opt.py

from time import clock

with (pointers, noexcept):
 class A:
  def __init__(self, x:int,y:int,z:int):
   self.x = x; self.y = y; self.z = z
  def add(self, other:A) ->A:
   a = A(self.x+other.x, self.y+other.y, self.z+other.z)
   return a
  def iadd(self, other:A):
   self.x += other.x
   self.y += other.y
   self.z += other.z

 class M:

  def f2(self, step:int, a:A, b:A, c:A, x:int,y:int,z:int) ->A:
   s = A(0,0,0)
   for j in range(step):
    u = A(x,y,z)
    w = a.add(u).add(b).add(c)
    s.iadd(w)
   return s

  def f1(self, x:int, y:int, a:A, b:A, c:A ) -> A:
   w = A(0,0,0)
   flip = False
   for i in range(x):
    if flip:
     flip = False
     w.iadd(self.f2(y, a,b,c, 1,2,3))
    else:
     flip = True
     w.iadd(self.f2(y, a,b,c, 4,5,6))
   return w

 def main():
  m = M()
  xsteps = 1000
  ysteps = 100
  start = clock()
  n = -1000000
  a = A(n,n+1,n)
  b = A(n,n+2,n)
  c = A(n,n+3,n)
  w = m.f1(xsteps, ysteps, a,b,c)
  print(clock()-start)

Sometimes people complain my benchmarks are unfair because Rusthon is static while Python and PyPy are dynamic. Or the fact that CPython has an entire ecosystem of 3rd party libraries. First of all, those 3rd party libraries are most often wrappers around a C or C++ library, and it is always a worry if they are being maintained. It is better to use C/C++ libraries directly, because Rusthon is translated to C++, there is no need for FFI, it can simply link-in and directly call into another C++ library. Rusthon is not dynamic, and static is a good thing. Do you really want to ship a Python interpreter with your application? The PyPy interpreter is over 100MB. Rusthon gives you a tiny statically linked exe, this benchmark exe is just 28KB. This lets you compile and run your application anywhere where C++ can be compiled, like mobile platforms where CPython and PyPy can not be used.

c++ output (pointers and noexcept)

class A {
  public:
 std::string __class__;
 int  y;
 int  x;
 int  z;
 A* __init__(int x, int y, int z) noexcept;
 A* add(A* other) noexcept;
 void iadd(A* other) noexcept;
 A() {__class__ = std::string("A");}
 virtual std::string getclassname() {return this->__class__;}
};
A* A::__init__(int x, int y, int z) noexcept {
 this->x = x;
 this->y = y;
 this->z = z;
 return this;
}
A* A::add(A* other) noexcept {
 A  _ref_a = A{};
 _ref_a.__init__((this->x + other->x), (this->y + other->y), (this->z + other->z));
 A* a = &_ref_a;
 return a;
}

void A::iadd(A* other) noexcept {
 this->x += other->x;
 this->y += other->y;
 this->z += other->z;
}

class M {
  public:
 std::string __class__;
 A* f2(int step, A* a, A* b, A* c, int x, int y, int z) noexcept;
 A* f1(int x, int y, A* a, A* b, A* c) noexcept;
 M() {__class__ = std::string("M");}
 virtual std::string getclassname() {return this->__class__;}
};
A* M::f2(int step, A* a, A* b, A* c, int x, int y, int z) noexcept {
  A  _ref_s = A{};_ref_s.__init__(0, 0, 0);
  A* s = &_ref_s;
  auto j = 0;
  auto j__end__ = step;
  while (( j ) < j__end__) {
   A  _ref_u = A{};
   _ref_u.__init__(x, y, z);
   A* u = &_ref_u;
   auto w = a->add(u)->add(b)->add(c);
   s->iadd(w);
   j ++;
  }
  return s;
}
A* M::f1(int x, int y, A* a, A* b, A* c) noexcept {
  A  _ref_w = A{};_ref_w.__init__(0, 0, 0);
  A* w = &_ref_w;
  auto flip = false;
  auto i = 0;
  auto i__end__ = x;
  while (( i ) < i__end__) {
   if (flip==true) {
    flip = false;
    w->iadd(this->f2(y, a, b, c, 1, 2, 3));
   } else {
    flip = true;
    w->iadd(this->f2(y, a, b, c, 4, 5, 6));
   }
   i ++;
  }
  return w;
}

int main() noexcept {

 M  _ref_m = M{};M* m = &_ref_m;
 auto xsteps = 1000;
 auto ysteps = 100;
 auto start = __clock__();
 auto n = -1000000;
 A  _ref_a = A{};_ref_a.__init__(n, (n + 1), n);
 A* a = &_ref_a;
 A  _ref_b = A{};_ref_b.__init__(n, (n + 2), n);
 A* b = &_ref_b;
 A  _ref_c = A{};_ref_c.__init__(n, (n + 3), n);
 A* c = &_ref_c;
 auto w = m->f1(xsteps, ysteps, a, b, c);
 std::cout << (__clock__() - start) << std::endl;
 return 0;
}

2 comments:

  1. Good one. Can Rusthon run complete set of benchmarks that pypy runs on speed.pypy.org?

    Would be nice to see a comparison.

    Also how compatible with exisiting libaries. For example if i build my Tornado + Web application using rusthon , will it work out of the box ?

    ReplyDelete
  2. Rusthon has basically zero compatibility with existing python libraries, and therefore is not able to run many of the benchmarks on speed.pypy.org.

    The result of any benchmark produced with the Rusthon JS transpiler is likely to be almost exactly the same as the results of the handwritten javascript versions from the benchmarksgame. These benchmarks show javascript under V8 is extremely fast.
    http://benchmarksgame.alioth.debian.org/u32/javascript.php

    The reason Rusthon is not compatible with existing python libraries is because there are significant differences in running native vs running in a web browser with non-blocking API. In addition, often it would not make sense to even try to port some existing python library to javascript, where there already existing and well maintained javascript libraries that solve the same problem - and is fully optimized for HTML5 and the way the browser works.

    So do not expect to build something in Tornado and have it work out of the box when you transpile it and run it under NodeJS. The wrapper I created around Tornado for NodeJS is a very minimal implementation of the Tornado API, and to do something complex, you will still need to learn to use the libraries and API of NodeJS.
    http://rusthon-lang.blogspot.com/2015/07/javascript-backend-nodejs-tornado-web.html

    ReplyDelete