parfor/README.md

[![pytest](https://github.com/wimpomp/parfor/actions/workflows/pytest.yml/badge.svg)](https://github.com/wimpomp/parfor/actions/workflows/pytest.yml)

# Parfor
Used to parallelize for-loops using parfor in Matlab? This package allows you to do the same in python.
Take any normal serial but parallelizable for-loop and execute it in parallel using easy syntax.
Don't worry about the technical details of using the multiprocessing module, race conditions, queues,
parfor handles all that. Now powered by [ray](https://pypi.org/project/ray/).

Tested on linux, Windows and OSX with python 3.10 and 3.12.

## Why is parfor better than just using multiprocessing?
- Easy to use
- Progress bars are built-in
- Retry the task in the main process upon failure for easy debugging
- Using a modified version of dill when ray fails to serialize an object:
  a lot more objects can be used when parallelizing

## How it works
[Ray](https://pypi.org/project/ray/) does all the heavy lifting. Parfor now is just a wrapper around ray, adding
some ergonomics.

## Installation
`pip install parfor`

## Usage
Parfor decorates a functions and returns the result of that function evaluated in parallel for each iteration of
an iterator.

## Requires
numpy, ray, tqdm

## Arguments
To functions `parfor.parfor`, `parfor.pmap` and `parfor.gmap`.

### Required:
    fun:      function taking arguments: iteration from  iterable, other arguments defined in args & kwargs
    iterable: iterable or iterator from which an item is given to fun as a first argument

### Optional:
    args:   tuple with other unnamed arguments to fun
    kwargs: dict with other named arguments to fun
    total:  give the length of the iterator in cases where len(iterator) results in an error
    desc:   string with description of the progress bar
    bar:    bool enable progress bar,
                or a callback function taking the number of passed iterations as an argument
    serial: execute in series instead of parallel if True, None (default): let pmap decide
    n_processes: number of processes to use,
        the parallel pool will be restarted if the current pool does not have the right number of processes
    yield_ordered: return the result in the same order as the iterable
    yield_index: return the index of the result too
    allow_output: allow output from subprocesses
    **bar_kwargs: keyword arguments for tqdm.tqdm

### Return
    list with results from applying the function 'fun' to each iteration of the iterable / iterator

## Examples
### Normal serial for loop
    <<
    from time import sleep

    a = 3
    fun = []
    for i in range(10):
        sleep(1)
        fun.append(a * i ** 2)
    print(fun)

    >> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]

### Using parfor to parallelize
    <<
    from time import sleep
    from parfor import parfor
    @parfor(range(10), (3,))
    def fun(i, a):
        sleep(1)
        return a * i ** 2
    print(fun)

    >> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]

    <<
    @parfor(range(10), (3,), bar=False)
    def fun(i, a):
        sleep(1)
        return a * i ** 2
    print(fun)

    >> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]

### Using parfor in a script/module/.py-file
Parfor should never be executed during the import phase of a .py-file. To prevent that from happening
use the `if __name__ == '__main__':` structure:

    <<
    from time import sleep
    from parfor import parfor

    if __name__ == '__main__':
        @parfor(range(10), (3,))
        def fun(i, a):
            sleep(1)
            return a * i ** 2
        print(fun)

    >> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]
or:

    <<
    from time import sleep
    from parfor import parfor

    def my_fun(*args, **kwargs):
        @parfor(range(10), (3,))
        def fun(i, a):
            sleep(1)
            return a * i ** 2
        return fun

    if __name__ == '__main__':
        print(my_fun())

    >> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]

### If you hate decorators not returning a function
pmap maps an iterator to a function like map does, but in parallel

    <<
    from parfor import pmap
    from time import sleep
    def fun(i, a):
        sleep(1)
        return a * i ** 2
    print(pmap(fun, range(10), (3,)))

    >> [0, 3, 12, 27, 48, 75, 108, 147, 192, 243]

### Using generators
If iterators like lists and tuples are too big for the memory, use generators instead.
Since generators don't have a predefined length, give parfor the length (total) as an argument (optional).

    <<
    import numpy as np
    c = (im for im in imagereader)
    @parfor(c, total=len(imagereader))
    def fun(im):
        return np.mean(im)

    >> [list with means of the images]

# Extra's
## `pmap`
The function parfor decorates, it's used similarly to `map`, it returns a list with the results.

## `gmap`
Same as pmap, but returns a generator. Useful to use the result as soon as it's generated.

## `Chunks`
Split a long iterator in bite-sized chunks to parallelize

## `ParPool`
More low-level accessibility to parallel execution. Submit tasks and request the result at any time,
(although to avoid breaking causality, submit first, then request), use different functions and function
arguments for different tasks.

## `SharedArray`
A numpy arrow that can be shared among processes.