ipython - MemoryError sending data to ipyparallel engines -


we love ipython.parallel (now ipyparallel).

there bugs me, though. when sending ~1.5gb pandas dataframe bunch of workers, memoryerror if cluster has many nodes. looks there many copies of dataframe there engines (or proportional number). there way avoid these copies?

example:

in[]: direct_view.push({'xy':xy}, block=true) # or direct_view['xy'] = xy 

for small cluster (e.g. 30 nodes), memory grows , grows, data goes through , fine. larger cluster, e.g. 80 nodes (all r3.4xlarge 1 engine, not n_core engines), htop reports growing memory max (123gb) , get:

--------------------------------------------------------------------------- memoryerror                               traceback (most recent call last) <ipython-input-120-f6a9a69761db> in <module>() ----> 1 get_ipython().run_cell_magic(u'time', u'', u"ipc.direct_view.push({'xy':xy}, block=true)")  /opt/anaconda/lib/python2.7/site-packages/ipython/core/interactiveshell.pyc in run_cell_magic(self, magic_name, line, cell)    2291             magic_arg_s = self.var_expand(line, stack_depth)    2292             self.builtin_trap: -> 2293                 result = fn(magic_arg_s, cell)    2294             return result    2295   (...) 

note, after looking @ https://ipyparallel.readthedocs.org/en/latest/details.html, tried send underlying numpy array (xy.values) in attempt have "non-copying send" memoryerror.

versions:

  • jupyter notebook v.4.0.4
  • python 2.7.10
  • ipyparallel.__version__: 4.0.2


Comments