Link to issue in GPTQ-for-LLaMa repo: https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/59#issue-1630614442
When running python setup_cuda.py install
in GPTQ-for-LLaMa, I'm now getting this error.
Traceback (most recent call last):
File "~/text-generation-webui/repositories/GPTQ-for-LLaMa/setup_cuda.py", line 6, in <module>
ext_modules=[cpp_extension.CUDAExtension(
File "~/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1048, in CUDAExtension
library_dirs += library_paths(cuda=True)
File "~/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1179, in library_paths
if (not os.path.exists(_join_cuda_home(lib_dir)) and
File "~/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2223, in _join_cuda_home
raise EnvironmentError('CUDA_HOME environment variable is not set. '
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
conda create -n textgen python=3.10.9
conda activate textgen
pip3 install torch torchvision torchaudio
pip install -r requirements.txt
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
python setup_cuda.py install
No response
n/a
Linux with nvidia GPU
I just installed using this method, setup.py didn't work for me https://github.com/oobabooga/text-generation-webui/issues/177#issuecomment-1464844721 its pre-assembled
I just installed using this method, setup.py didn't work for me #177 (comment) its pre-assembled
That may work for Windows but my issue is in Linux
See comment here for possible workaround: https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/59#issuecomment-1475041809
I have managed to install nvcc with
conda install -c conda-forge cudatoolkit-dev
The command above takes some 10 minutes to run and shows no progress bar or updates along the way.
This allows me to run
python setup_cuda.py install
for GPTQ-for-LLaMa installation, but then python server.py --listen --model llama-7b --gptq-bits 4
fails with
raise RuntimeError('Attempting to deserialize object on a CUDA RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are > running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
I have managed to install nvcc with
conda install -c conda-forge cudatoolkit-dev
So the solution is simple - once running that line restart WSL. If you have already fixed the CUDA semantic links, than running that and restarted is the last step.
Thanks @LTSarc, restarting the computer indeed worked. For better reproducibility, here is what I did to get 4-bit working again:
textgen
environment following https://github.com/oobabooga/text-generation-webui/issues/400#issuecomment-1474876859conda activate textgen
conda install -c conda-forge cudatoolkit-dev
GPTQ-for-LLaMa
folder:rm -rf repositories/GPTQ-for-LLaMa/
GPTQ-for-LLaMa
againcd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
GPTQ-for-LLaMa
:cd GPTQ-for-LLaMa
python setup_cuda.py install
python server.py --listen --model llama-7b --gptq-bits 4
Last night I did a 7+ hour binge getting both 4-bit Llama and Deepspeed (for Pygmalion) running on WSL2. It was... an experience. WSL has a lot of bugs.
Also didn't help this was my first ever time at Linux (although not my first time in CLIs, I used to program win32 CLI programs).
I had to fix this as well and did it on Windows (no WSL). Here are my steps. Hopefully this saves someone else hours of work.
conda create -n textgen python=3.10.9
conda activate textgen
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
git clone https://github.com/oobabooga/text-generation-webui
cd text-generation-webui
pip install -r requirements.txt
Run these commands:
conda install -c conda-forge cudatoolkit-dev
mkdir repositories
cd repositories
git clone https://github.com/qwopqwop200/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
python setup_cuda.py install
Note: The last command caused me a lot of problems until I found the first command which installs the cudatoolkit. If it fails, installing Build Tools for Visual Studio 2019 (has to be 2019) here, checking "Desktop development with C++" when installing, and adding the cl
compiler to the environment may help. The last command needs a C++ compiler and an Nvidia CUDA compiler.
python download-model.py decapoda-research/llama-Xb-hf
where X
is the size of the model you want to download like 7
or 13
.models/llama-Xb-hf/tokenizer_config.json
and change LLaMATokenizer
to LlamaTokenizer
..pt
file into model/llama-Xb-hf
and you should be done.python server.py --model llama-Xb-hf
python server.py --model llama-Xb-hf --load-in-8bit
python server.py --model llama-Xb-hf --gptq-bits 4
I would recommend changing the pytorch install instructions to:
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia
This will install pytorch and cuda-toolkit, which comes with nvcc, whilst overriding all of the 12.0 cuda packages that pytorch tries to install. You could even combine it with the environment creation:
conda create -n textgen pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia
It's also worth noting that conda-forge is a community operated organization and that you can get the cuda-toolkit directly from NVIDIA with cuda-toolkit -c 'nvidia/label/cuda-11.7.0'
or cuda-toolkit -c 'nvidia/label/cuda-11.7.1'
I haven't tried it yet, but it is possible to install just nvcc with: cuda-nvcc -c 'nvidia/label/cuda-11.7.0'
When doing python setup_cuda.py install I get:
(textgen) E:\oobabooga\text-generation-webui\repositories\GPTQ-for-LLaMa>python setup_cuda.py install running install C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. warnings.warn( C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools. warnings.warn( running bdist_egg running egg_info writing quant_cuda.egg-info\PKG-INFO writing dependency_links to quant_cuda.egg-info\dependency_links.txt writing top-level names to quant_cuda.egg-info\top_level.txt C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) reading manifest file 'quant_cuda.egg-info\SOURCES.txt' writing manifest file 'quant_cuda.egg-info\SOURCES.txt' installing library code to build\bdist.win-amd64\egg running install_lib running build_ext C:\Users\cyper\miniconda3\envs\textgen\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: [WinError 2] Det går inte att hitta filen warnings.warn(f'Error checking compiler version for {compiler}: {error}') error: [WinError 2] Det går inte att hitta filen
(det går inte att hitta filen) is just swedish for cannot find the file. I have set the environment path to the path where cl.exe is located and have followed all the steps to the point.
I'm going to try try manually installing cuda instead using jllllll's advice, if that fails I'm probably done with trying to install the 4-bit functionality until an easier way is made. I've tried for several days now and it's just not worth the frustration.
I just installed using this method, setup.py didn't work for me #177 (comment) its pre-assembled
Got it to work using this method.
@oobabooga could you distribute the .whl file so we do not have do follow the whole process? This is for WSL on windows, which would be the official recommended method you are recommending.
@oobabooga could you distribute the .whl file so we do not have do follow the whole process? This is for WSL on windows, which would be the official recommended method you are recommending.
You can build the wheel yourself for future use with: python setup_cuda.py bdist_wheel
This will place the wheel in a dist
folder next to setup_cuda.py.
Thanks, but I am hoping to use other people's .whls as I do take a while to gather and follow the build process.
Also, if anyone using wsl starts having issues with bitsandbytes not finding libcuda.so
, this is because of a bug in wsl where Windows-level gpu drivers are not linked properly within wsl. The workaround is to run this before running server.py:
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
@jllllll do you have a .whl file?
I'm stuck on certain issues which I'm unsure about.
I followed through on aregular installation process on WSL, hoping the gpu could be detected. when I run the build process, no gpu was detected, so I followed conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia
and restarted WSL.
I did this too. export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
sorry about the weird paste in advance, I don't know what it's doing
(textgen) [email protected]:~/text-generation-webui/repositories/GPTQ-for-LLaMa$ python setup_cuda.py bdist_wheel No CUDA runtime is found, using CUDA_HOME='/home/ubuntu/miniconda3/envs/textgen' running bdist_wheel /home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend. warnings.warn(msg.format('we could not find ninja.')) running build running build_ext Traceback (most recent call last): File "/home/ubuntu/text-generation-webui/repositories/GPTQ-for-LLaMa/setup_cuda.py", line 4, in setup( File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/init.py", line 87, in setup return distutils.core.setup(**attrs) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup return run_commands(dist) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands dist.run_commands() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands self.run_command(cmd) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 325, in run self.run_command("build") File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run self.run_command(cmd_name) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command self.distribution.run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/dist.py", line 1208, in run_command super().run_command(command) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command cmd_obj.run() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 84, in run _build_ext.run(self) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run self.build_extensions() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 485, in build_extensions compiler_name, compiler_version = self._check_abi() File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 869, in _check_abi _, version = get_compiler_abi_compatibility_and_version(compiler) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 337, in get_compiler_abi_compatibility_and_version if not check_compiler_ok_for_platform(compiler): File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 291, in check_compiler_ok_for_platform which = subprocess.check_output(['which', compiler], stderr=subprocess.STDOUT) File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/subprocess.py", line 421, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['which', 'g++']' returned non-zero exit status 1. (textgen) [email protected]:~/text-generation-webui/repositories/GPTQ-for-LLaMa$
Normal inference with just server.py won't run for me also, on 4bafe45a517bbe561e4a39a2582fa9af80487194
(textgen) [email protected]:~/text-generation-webui$ python server.py Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/compat.py", line 11, in import chardet ModuleNotFoundError: No module named 'chardet' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/text-generation-webui/server.py", line 10, in import gradio as gr File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/init.py", line 3, in import gradio.components as components File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/components.py", line 34, in from gradio import media_data, processing_utils, utils File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/gradio/processing_utils.py", line 19, in import requests File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/init.py", line 45, in from .exceptions import RequestsDependencyWarning File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/exceptions.py", line 9, in from .compat import JSONDecodeError as CompatJSONDecodeError File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/compat.py", line 13, in import charset_normalizer as chardet File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/charset_normalizer/init.py", line 23, in from charset_normalizer.api import from_fp, from_path, from_bytes, normalize File "/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/charset_normalizer/api.py", line 10, in from charset_normalizer.md import mess_ratio File "charset_normalizer/md.py", line 5, in ImportError: cannot import name 'COMMON_SAFE_ASCII_CHARACTERS' from 'charset_normalizer.constant' (/home/ubuntu/miniconda3/envs/textgen/lib/python3.10/site-packages/charset_normalizer/constant.py)
@jllllll do you have a .whl file?
I'm stuck on certain issues which I'm unsure about.
I followed through on aregular installation process on WSL, hoping the gpu could be detected. when I run the build process, no gpu was detected, so I followed
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit -c 'nvidia/label/cuda-11.7.0' -c pytorch -c nvidia
and restarted WSL.I did this too.
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
Normal inference with just server.py won't run for me also, on
4bafe45a517bbe561e4a39a2582fa9af80487194
Here is a freshly compiled wheel:
quant_cuda-0.0.0-cp310-cp310-linux_x86_64.whl.zip
Make sure that you performed both of the pip install -r requirements.txt
steps. You may need to install cuda into wsl using these commands:
wget https://developer.download.nvidia.com/compute/cuda/11.7.1/local_installers/cuda_11.7.1_515.65.01_linux.run
sudo sh cuda_11.7.1_515.65.01_linux.run
Make sure not to use the driver installation option. That isn't for wsl.
It also wouldn't hurt to try restarting wsl manually with wsl --shutdown
in powershell or cmd.
Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:
Loading llama-7b-hf...
Traceback (most recent call last):
File "C:\Users\X\text-generation-webui\server.py", line 199, in <module>
shared.model, shared.tokenizer = load_model(shared.model_name)
File "C:\Users\X\text-generation-webui\modules\models.py", line 94, in load_model
model = load_quantized(model_name)
File "C:\Users\X\text-generation-webui\modules\GPTQ_loader.py", line 55, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
TypeError: load_quant() missing 1 required positional argument: 'groupsize'
Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:
Loading llama-7b-hf... Traceback (most recent call last): File "C:\Users\X\text-generation-webui\server.py", line 199, in <module> shared.model, shared.tokenizer = load_model(shared.model_name) File "C:\Users\X\text-generation-webui\modules\models.py", line 94, in load_model model = load_quantized(model_name) File "C:\Users\X\text-generation-webui\modules\GPTQ_loader.py", line 55, in load_quantized model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits) TypeError: load_quant() missing 1 required positional argument: 'groupsize'
I am also getting the same error
Also, if anyone using wsl starts having issues with bitsandbytes not finding
libcuda.so
, this is because of a bug in wsl where Windows-level gpu drivers are not linked properly within wsl. The workaround is to run this before running server.py:export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH
Thanks,this help me a lot. I had been stack with this problem for a day now
Thanks for the solution, now setup-cuda.py works, but when I try to load model I get this error:
Loading llama-7b-hf... Traceback (most recent call last): File "C:\Users\X\text-generation-webui\server.py", line 199, in <module> shared.model, shared.tokenizer = load_model(shared.model_name) File "C:\Users\X\text-generation-webui\modules\models.py", line 94, in load_model model = load_quantized(model_name) File "C:\Users\X\text-generation-webui\modules\GPTQ_loader.py", line 55, in load_quantized model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits) TypeError: load_quant() missing 1 required positional argument: 'groupsize'
Got the same thing. I added a '-1' argument to the load_quant() function for the group size. I don't know what it does exactly.
But then you get this error:
Error(s) in loading state_dict for LlamaForCausalLM:
Missing key(s) in state_dict: "model.layers.0.self_attn.q_proj.qzeros", "model.layers.0.self_attn.k_proj.qzeros", "model.layers.0.self_attn.v_proj.qzeros", "model.layers.0.self_attn.o_proj.qzeros", "model.layers.0.mlp.gate_proj.qzeros", "model.layers.0.mlp.down_proj.qzeros", "model.layers.0.mlp.up_proj.qzeros", "model.layers.1.self_attn.q_proj.qzeros", "model.layers.1.self_attn.k_proj.qzeros", "model.layers.1.self_attn.v_proj.qzeros", "model.layers.1.self_attn.o_proj.qzeros",
...
Looks like were running the wrong version of GPTQ for the data we have.