Describe the bug

There seems to be some issue with multiprocessing in Python and haystack.

If I import the multiprocessing library and don't import any haystack modules, I can run the following code without any error:

from multiprocessing import Pool, cpu_count
from tqdm.notebook import tqdm
def func(x):
    return x*x

with Pool(cpu_count()) as pool:
    _ = list(tqdm(pool.imap(func, list(range(10000))), total=10000))

If however, I import anything from haystack, say, the Document class at any point in my notebook like so:

from haystack.schema import Document
from multiprocessing import Pool, cpu_count
from tqdm.notebook import tqdm

Then the kernel dies and I get an error message: "ZMQError: Address already in use"

Error message

Traceback (most recent call last): File "/home/ross/anaconda3/envs/haystack/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/ross/anaconda3/envs/haystack/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/ipykernel_launcher.py", line 17, in app.launch_new_instance() File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/traitlets/config/application.py", line 981, in launch_instance app.initialize(argv) File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/traitlets/config/application.py", line 110, in inner return method(app, *args, **kwargs) File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 666, in initialize self.init_sockets() File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 307, in init_sockets self.shell_port = self._bind_socket(self.shell_socket, self.shell_port) File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 244, in _bind_socket return self._try_bind_socket(s, port) File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/ipykernel/kernelapp.py", line 220, in _try_bind_socket s.bind("tcp://%s:%i" % (self.ip, port)) File "/home/ross/anaconda3/envs/haystack/lib/python3.10/site-packages/zmq/sugar/socket.py", line 232, in bind super().bind(addr) File "zmq/backend/cython/socket.pyx", line 568, in zmq.backend.cython.socket.Socket.bind File "zmq/backend/cython/checkrc.pxd", line 28, in zmq.backend.cython.checkrc._check_rc zmq.error.ZMQError: Address already in use

Expected behavior I want to be able to use multiprocessing for other tasks outside the context of haystack, so I expect that I should be able to use multiprocessing libraries.

Additional context This was performed in Jupyter Lab v3.5.0

I created a separate virtual environment for my jupyter kernel called haystack and installed haystack using the command:

pip install farm-haystack[gpu-all]

I had some issues with the install around FAISS and PyTorch:

  • I had to downgrad faiss-cpu to version 1.7.2 as per issue #3600
  • My GPU is relatively new and is using sm_86, so I had to manually upgrade PyTorch

Pip freeze of my environment:

aiohttp==3.8.3 aiorwlock==1.3.0 aiosignal==1.3.1 alembic==1.8.1 appdirs==1.4.4 astroid==2.12.13 asttokens==2.1.0 async-generator==1.10 async-timeout==4.0.2 attrs==22.1.0 audioread==3.0.0 azure-ai-formrecognizer==3.2.0 azure-common==1.1.28 azure-core==1.26.1 backcall==0.2.0 backoff==1.11.1 beautifulsoup4==4.11.1 beir==1.0.1 black==22.6.0 bleach==5.0.1 cattrs==22.2.0 certifi @ file:///croot/certifi_1665076670883/work/certifi cffi==1.15.1 cfgv==3.3.1 charset-normalizer==2.1.1 ci-sdr==0.0.2 click==8.0.4 cloudpickle==2.2.0 coloredlogs==15.0.1 ConfigArgParse==1.5.3 contourpy==1.0.6 coverage==6.5.0 ctc-segmentation==1.7.4 cycler==0.11.0 Cython==0.29.32 databind==1.5.3 databind.core==1.5.3 databind.json==1.5.3 databricks-cli==0.17.3 datasets==2.7.0 debugpy==1.6.3 decorator==5.1.1 defusedxml==0.7.1 Deprecated==1.2.13 dill==0.3.6 Distance==0.1.3 distlib==0.3.6 dnspython==2.2.1 docker==6.0.1 docopt==0.6.2 docspec==2.0.2 docspec-python==2.0.2 docstring-parser==0.11 einops==0.6.0 elasticsearch==7.9.1 entrypoints==0.4 espnet==202209 espnet-model-zoo==0.1.7 espnet-tts-frontend==0.0.3 exceptiongroup==1.0.4 executing==1.2.0 faiss-cpu==1.7.2 faiss-gpu==1.7.2 farm-haystack==1.11.0 fast-bss-eval==0.1.3 fastjsonschema==2.16.2 filelock==3.8.0 Flask==2.2.2 flatbuffers==22.10.26 fonttools==4.38.0 frozenlist==1.3.3 fsspec==2022.11.0 g2p-en==2.1.0 ghp-import==2.1.0 gitdb==4.0.9 GitPython==3.1.29 greenlet==2.0.1 grpcio==1.37.1 grpcio-tools==1.37.1 gunicorn==20.1.0 h11==0.14.0 h5py==3.7.0 huggingface-hub==0.11.0 humanfriendly==10.0 identify==2.5.9 idna==3.4 importlib-metadata==4.13.0 inflect==6.0.2 iniconfig==1.1.1 ipykernel==6.17.1 ipython==8.6.0 ipywidgets==8.0.2 isodate==0.6.1 isort==5.10.1 itsdangerous==2.1.2 jaconv==0.3 jamo==0.4.1 jarowinkler==1.2.3 jedi==0.18.2 Jinja2==3.1.2 joblib==1.2.0 jsonschema==4.17.0 jupyter_client==7.4.7 jupyter_core==5.0.0 jupytercontrib==0.0.7 jupyterlab-pygments==0.2.2 jupyterlab-widgets==3.0.3 kaldiio==2.17.2 kiwisolver==1.4.4 langdetect==1.0.9 lazy-object-proxy==1.8.0 librosa==0.9.2 llvmlite==0.39.1 loguru==0.6.0 lxml==4.9.1 Mako==1.2.4 Markdown==3.3.7 MarkupSafe==2.1.1 matplotlib==3.6.2 matplotlib-inline==0.1.6 mccabe==0.7.0 mergedeep==1.3.4 mistune==2.0.4 mkdocs==1.4.2 mlflow==2.0.1 mmh3==3.0.0 monotonic==1.6 more-itertools==9.0.0 mpmath==1.2.1 msgpack==1.0.4 msrest==0.7.1 multidict==6.0.2 multiprocess==0.70.14 mypy==0.991 mypy-extensions==0.4.3 nbclient==0.7.0 nbconvert==7.2.5 nbformat==5.7.0 nest-asyncio==1.5.6 networkx==2.8.8 nltk==3.7 nodeenv==1.7.0 nr.util==0.8.12 num2words==0.5.12 numba==0.56.4 numpy==1.23.5 oauthlib==3.2.2 onnx==1.12.0 onnxruntime-gpu==1.13.1 onnxruntime-tools==1.7.0 opensearch-py==2.0.0 outcome==1.2.0 packaging==21.3 pandas==1.5.1 pandocfilters==1.5.0 parso==0.8.3 pathspec==0.10.2 pdf2image==1.16.0 pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.3.0 pinecone-client==2.0.13 platformdirs==2.5.4 pluggy==1.0.0 pooch==1.6.0 posthog==2.2.0 pre-commit==2.20.0 prompt-toolkit==3.0.33 protobuf==3.20.1 psutil==5.9.4 psycopg2-binary==2.9.5 ptyprocess==0.7.0 pure-eval==0.2.2 py==1.11.0 py-cpuinfo==9.0.0 py3nvml==0.2.7 pyarrow==10.0.0 pycparser==2.21 pydantic==1.10.2 pydoc-markdown==4.6.4 pydub==0.25.1 Pygments==2.13.0 PyJWT==2.6.0 pylint==2.15.6 pymilvus==2.0.2 pyparsing==3.0.9 pypinyin==0.44.0 pyrsistent==0.19.2 PySocks==1.7.1 pytesseract==0.3.10 pytest==7.2.0 pytest-custom-exit-code==0.3.0 python-dateutil==2.8.2 python-docx==0.8.11 python-dotenv==0.21.0 python-magic==0.4.27 python-multipart==0.0.5 pytorch-wpe==0.0.1 pytrec-eval==0.5 pytz==2022.6 pyworld==0.3.2 PyYAML==5.4.1 pyyaml_env_tag==0.1 pyzmq==24.0.1 quantulum3==0.7.11 querystring-parser==1.2.4 rapidfuzz==2.7.0 ray==1.13.0 rdflib==6.2.0 regex==2022.10.31 requests==2.28.1 requests-cache==0.9.7 requests-oauthlib==1.3.1 resampy==0.4.2 responses==0.18.0 s3cmd==2.3.0 scikit-learn==1.1.3 scipy==1.9.3 selenium==4.6.0 sentence-transformers==2.2.2 sentencepiece==0.1.97 seqeval==1.2.2 shap==0.41.0 six==1.16.0 slicer==0.0.7 smmap==5.0.0 sniffio==1.3.0 sortedcontainers==2.4.0 soundfile==0.11.0 soupsieve==2.3.2.post1 SPARQLWrapper==2.0.0 SQLAlchemy==1.4.44 SQLAlchemy-Utils==0.38.3 sqlparse==0.4.3 stack-data==0.6.1 sympy==1.11.1 tabulate==0.9.0 threadpoolctl==3.1.0 tika==1.24 tinycss2==1.2.1 tokenize-rt==5.0.0 tokenizers==0.12.1 toml==0.10.2 tomli==2.0.1 tomli_w==1.0.0 tomlkit==0.11.6 torch==1.13.0+cu116 torch-complex==0.4.3 torchaudio==0.13.0+cu116 torchvision==0.14.0+cu116 tornado==6.2 tox==3.27.1 tqdm==4.64.1 traitlets==5.5.0 transformers==4.21.2 trio==0.22.0 trio-websocket==0.9.2 typeguard==2.13.3 typing_extensions==4.4.0 ujson==5.1.0 Unidecode==1.3.6 url-normalize==1.4.3 urllib3==1.26.12 validators==0.18.2 virtualenv==20.16.7 watchdog==2.1.9 wcwidth==0.2.5 weaviate-client==3.9.0 webdriver-manager==3.8.5 webencodings==0.5.1 websocket-client==1.4.2 Werkzeug==2.2.2 widgetsnbextension==4.0.3 wrapt==1.14.1 wsproto==1.2.0 xmltodict==0.13.0 xxhash==3.1.0 yapf==0.32.0 yarl==1.8.1 zipp==3.10.0

FAQ Check

System:

  • OS: Ubuntu 20.04
  • GPU/CPU: Nvidia GeForce RTX 3060 (Cuda v11.6)/Intel i7-12700KF
  • Haystack version (commit or version number): 1.11.0
  • DocumentStore: N/A
  • Reader: N/A
  • Retriever: N/A
1

I also tried reinstalling haystack using the instructions on GitHub i.e. cloning the repo and installing from source, but I'm still getting the same error when I run list(tqdm(pool.imap(func, list(range(10000))), total=10000)) after importing anything from haystack.

0
© 2022 pullanswer.com - All rights reserved.