[vistrails-dev] [vistrails-users] Running modules in parallel

Ryan Danks Ryan.Danks at rwdi.com
Tue Jan 7 10:07:19 EST 2014


Hi Tommy,

I've recently upgraded to 2.1 (64 bit Windows) and I've discovered some issues when using the python multiprocessing module. I currently use multiprocessing.Pool in some modules within a custom package I've created. This worked fine in all the 2.0.x versions but with 2.1 it seems the sub processes simply crash and then hang.

After some debugging what seems to be the problem is that now that the python.exe and pythonw.exe executables have moved out of C:\Program Files\Vistrails\vistrails, when sub-processes are launched they do not have that folder included in sys.path and as such when the new processes run  my package's __init__.py file they can't find the vistrails.core module, causing the crash. My current workaround is to wrap the first attempt at importing from vistrails.core in a try/except block. That way if an ImportError exception is thrown, I can manually add the folder to sys.path and retry the import. (See Below)
try:
    from core.configuration import ConfigurationObject
except ImportError:
    import sys
    sys.path.append("C:\\Program Files\\VisTrails\\vistrails")
  from core.configuration import ConfigurationObject


It's not the cleanest of solution, particularly because the path to the vistrails folder needs to be hardcoded since I can't use any of the helper functions in core.system, but it works. I've tried something a little cleaner in my init.py file:

visTrailsPath = join(split(core.system.vistrails_root_directory())[0],'vistrails')
if not(visTrailsPath in sys.path):
    sys.path.append(visTrailsPath)

but this results in a new error when my module gets loaded from "packagemanager.py" saying:
"local variable 'deps' referenced before assignment"

The weird thing is that I only get this error when I use sys.path, i.e. both the 2nd and 3rd lines in the above snippet cause the same error but not the 1st.

So I guess I'm wondering if you or the other devs have any ideas for a cleaner fix for this issue, or if there isn't one. The details of my hacky fix should probably be included on the multiprocessing section of the user guide.

Thanks

-Ryan
From: Tommy Ellqvist [mailto:tellqvis at poly.edu]
Sent: Thursday, April 04, 2013 12:30 PM
To: Ryan Danks; Tommy Ellqvist
Cc: Vis Trails development
Subject: SV: SV: [vistrails-users] Running modules in parallel


________________________________
Från: Ryan Danks <Ryan.Danks at rwdi.com<mailto:Ryan.Danks at rwdi.com>>
Till: Tommy Ellqvist <tellqvis at duke.poly.edu<mailto:tellqvis at duke.poly.edu>>
Skickat: torsdag, 4 april 2013 18:02
Ämne: Re: SV: [vistrails-users] Running modules in parallel

Hi Tommy,

Looks like its fixed!
I also made another change so I'm not sure what exactly did it, but if this comes up again, what I did was:
1) Apply the patch you sent
2) Move the function I call with Pool.map to another file (utils.py)
3) import utils into the file and call Pool.map with utils.eval_func_tuple

Great!

I am still getting an error should I reload my module and then re run, which says:

PicklingError: Can't pickle <class 'multiprocessing.process.AuthenticationString'>: it's not the same object as multiprocessing.process.AuthenticationString.

Looks like somewhere the Auth string object is trying to be pickled, rather than the actual value of that string. The full stack trace is attached.

However if I don't reload everything works as you would expect.

Perhaps our module reloading does not work with multiprocessing? Maybe we can fix it by not reloading multiprocessing module.

Best,
Tommy

The stack trace:

Traceback (most recent call last):
  File "C:\Program Files\VisTrails\vistrails\core\modules\vistrails_module.py",line 328, in update
    self.compute()
  File "C:\Users\rcd\.vistrails\userpackages\..\userpackages\RwdiMetToolbox\Calculators.py", line 44, in compute
    pool = Pool(processes=numProcs)
  File "C:\Program Files\VisTrails\vistrails\Python27_64\lib\multiprocessing\__init__.py", line 232, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild)
  File "C:\Program Files\VisTrails\vistrails\Python27_64\lib\multiprocessing\pool.py", line 134, in __init__
    self._repopulate_pool()
  File "C:\Program Files\VisTrails\vistrails\Python27_64\lib\multiprocessing\pool.py", line 197, in _repopulate_pool
    w.start()
  File "C:\Program Files\VisTrails\vistrails\Python27_64\lib\multiprocessing\process.py", line 130, in start
    self._popen = Popen(self)
  File "C:\Program Files\VisTrails\vistrails\Python27_64\lib\multiprocessing\forking.py", line 270, in __init__
    dump(prep_data, to_child, HIGHEST_PROTOCOL)
  File "C:\Program Files\VisTrails\vistrails\Python27_64\lib\multiprocessing\forking.py", line 193, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Program Files\VisTrails\vistrails\Python27_64\lib\pickle.py", line 224, in dump
    self.save(obj)
  File "C:\Program Files\VisTrails\vistrails\Python27_64\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Program Files\VisTrails\vistrails\Python27_64\lib\pickle.py", line 649, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Program Files\VisTrails\vistrails\Python27_64\lib\pickle.py", line 681, in _batch_setitems
    save(v)
  File "C:\Program Files\VisTrails\vistrails\Python27_64\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Program Files\VisTrails\vistrails\Python27_64\lib\pickle.py", line 400, in save_reduce
    save(func)
  File "C:\Program Files\VisTrails\vistrails\Python27_64\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Program Files\VisTrails\vistrails\Python27_64\lib\pickle.py", line 753, in save_global
    (obj, module, name))
PicklingError: Can't pickle <class 'multiprocessing.process.AuthenticationString'>: it's not the same object as multiprocessing.process.AuthenticationString


Thanks for the help, let me know if there's any more I can do to assist.

-Ryan

>>> Tommy Ellqvist <tellqvis at poly.edu<mailto:tellqvis at poly.edu>> 4/4/2013 10:25 AM >>>
Hi Ryan,

It looks like getting multiprocessing to work would be the easiest option (unless you need to scale it to multiple machines).

I am not sure if this is related to your problem, but Remi (cc:ed) recently found a bug on Windows if you have you package in the userpackage directory (patch attached). If not,  are you sure you are not seeing any errors in the console? You can open the VisTrails console before executing to see messages on stderr.

Best,
Tommy

________________________________
Från: Ryan Danks <Ryan.Danks at rwdi.com<mailto:Ryan.Danks at rwdi.com>>
Till: Tommy Ellqvist <tellqvis at duke.poly.edu<mailto:tellqvis at duke.poly.edu>>
Skickat: torsdag, 4 april 2013 15:24
Ämne: Re: [vistrails-users] Running modules in parallel

Hi Tommy,

My colleagues and I are often having to load, analyze, manipulate and plot meteorological datasets so I am in the process of creating a suite of blocks designed to make this process easier and help standardize the output.  The idea would be to create a chain such as File -> Met Reader block (parses a file to a standard python class I've created, fills in missing data, compiles stats about the set, etc.) -> Manipulation block(s) (Filter data set, Calculate additional data, etc) -> Output block(s) (plots,write to files, etc.)

Unfortunately, since these data sets are typically at least a year of hourly data (8760 entries), performing certain calculations and reading in the sets initially can take quite a while. And since this is a trivially parallelizable problem I was attempting to use multiprocessing.Pool in the 'compute' method of a block class. I've attached the python module where I store all the "Calculator" type classes, there's only one class in the file and I've added additional comments to hopefully clarify my issue. But if you have any other questions please let me know!

I'd also definitely be interested in looking at the ipython package. Thanks for the help!

Regards,

-Ryan


>>> Tommy Ellqvist <tellqvis at poly.edu<mailto:tellqvis at poly.edu>> 4/4/2013 8:46 AM >>>
Hi Ryan,

Python multiprocessing should work in a module. I you provide more information we can help you debug the problem.

We are also working on a package that uses ipython to execute modules in parallell, if you are interested we can help you try it out?

If you goal is to run external programs in parallel (either locally or remote), you can have a look at the JobSubmission package:
http://www.vistrails.org/index.php/FAQ#JobSubmission


If you describe your goals we can try to give you a more specific answer.

Best,
Tommy Ellqvist
VisTrails Developer

________________________________
Från: Ryan Danks <Ryan.Danks at rwdi.com<mailto:Ryan.Danks at rwdi.com>>
Till: vistrails-users at vistrails.org<mailto:vistrails-users at vistrails.org>
Skickat: onsdag, 3 april 2013 15:47
Ämne: [vistrails-users] Running modules in parallel

I've been trying to get Vistrails to work in parallel both by using the server instructions found in the documentation as well as by using the standard Python multiprocessing.Pool objects within my module. Neither of these approaches seem to work. Has anybody successfully run their Vistrails code in parallel? If so, how?




Ryan Danks, B.A.Sc., P.Eng.
Research & Development Engineer, Built Environment
Rowan, Williams Davies & Irwin Inc. (RWDI)
Consulting Engineers & Scientists

650 Woodlawn Road West
Guelph, Ontario, Canada N1K 1B8
T (519) 823-1311 x 2282
F (519) 823-1316
E ryan.danks at RWDI.com<mailto:ryan.danks at RWDI.com>
W www.rwdi.com<http://www.rwdi.com/>
RWDI - One of Canada's 50 Best Managed Companies www.rwdi.com/50_best/<http://www.rwdi.com/50_best/> This communication is intended for the sole use of the party to whom it was addressed and may contain information that is privileged and/or confidential. Any other distribution, copying or disclosure is strictly prohibited. If you received this email in error, please notify us immediately by replying to this email and delete the message without retaining any hard or electronic copies of same. Outgoing emails are scanned for viruses, but no warranty is made to their absence in this email or attachments.

_______________________________________________
vistrails-users mailing list
vistrails-users at vistrails.org<mailto:vistrails-users at vistrails.org>
http://lists.vistrails.org/mailman/listinfo/vistrails-users

RWDI - One of Canada's 50 Best Managed Companies www.rwdi.com/50_best/<http://www.rwdi.com/50_best/> This communication is intended for the sole use of the party to whom it was addressed and may contain information that is privileged and/or confidential. Any other distribution, copying or disclosure is strictly prohibited. If you received this email in error, please notify us immediately by replying to this email and delete the message without retaining any hard or electronic copies of same. Outgoing emails are scanned for viruses, but no warranty is made to their absence in this email or attachments.

RWDI - One of Canada's 50 Best Managed Companies www.rwdi.com/50_best/<http://www.rwdi.com/50_best/> This communication is intended for the sole use of the party to whom it was addressed and may contain information that is privileged and/or confidential. Any other distribution, copying or disclosure is strictly prohibited. If you received this email in error, please notify us immediately by replying to this email and delete the message without retaining any hard or electronic copies of same. Outgoing emails are scanned for viruses, but no warranty is made to their absence in this email or attachments.



     RWDI - One of Canada's 50 Best Managed Companies
     
This communication is intended for the sole use of the party to whom it was addressed and may contain information that is privileged and/or confidential. Any other distribution, copying or disclosure is strictly prohibited. If you received this email in error, please notify us immediately by replying to this email and delete the message without retaining any hard or electronic copies of same. 

Outgoing emails are scanned for viruses, but no warranty is made to their absence in this email or attachments.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 35706 bytes
Desc: not available
URL: <http://lists.vistrails.org/pipermail/vistrails-dev/attachments/20140107/56467a60/attachment-0001.bin>


More information about the vistrails-dev mailing list