Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error upon installing #42

Open
ViralBShah opened this issue Nov 29, 2020 · 8 comments
Open

Error upon installing #42

ViralBShah opened this issue Nov 29, 2020 · 8 comments
Assignees

Comments

@ViralBShah
Copy link
Member

Using Julia 1.5.3 on a computer with GPU:

julia> using Torch
[ Info: Precompiling Torch [6a2ea274-3061-11ea-0d63-ff850051a295]
ERROR: LoadError: InitError: could not load library "/home/viralbshah/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libdoeye_caml.so"
libcublas.so.10: cannot open shared object file: No such file or directory
Stacktrace:
 [1] dlopen(::String, ::UInt32; throw_error::Bool) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109
 [2] dlopen at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109 [inlined] (repeats 2 times)
@ViralBShah
Copy link
Member Author

Perhaps the same as #32?

@k8lion
Copy link

k8lion commented Feb 3, 2021

I am having the same issue both locally and on a cluster. Both have Julia 1.5.3 and CUDA 11.0.

julia> using Torch
[ Info: Precompiling Torch [6a2ea274-3061-11ea-0d63-ff850051a295]
ERROR: LoadError: InitError: could not load library "/tmpdir/maile/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libdoeye_caml.so"
libcufft.so.10: cannot open shared object file: No such file or directory
Stacktrace:
 [1] dlopen(::String, ::UInt32; throw_error::Bool) at /usr/local/julia/julia-1.5.3/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109
 [2] dlopen at /usr/local/julia/julia-1.5.3/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109 [inlined] (repeats 2 times)

Trying on CUDA 10.1 yields a similar error:

julia> using Torch
[ Info: Precompiling Torch [6a2ea274-3061-11ea-0d63-ff850051a295]
ERROR: LoadError: InitError: could not load library "/tmpdir/maile/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libdoeye_caml.so"
/lib64/libm.so.6: version `GLIBC_2.23' not found (required by /tmpdir/maile/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libtorch.so)
Stacktrace:
 [1] dlopen(::String, ::UInt32; throw_error::Bool) at /usr/local/julia/julia-1.5.3/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109
 [2] dlopen at /usr/local/julia/julia-1.5.3/share/julia/stdlib/v1.5/Libdl/src/Libdl.jl:109 [inlined] (repeats 2 times)

@DhairyaLGandhi
Copy link
Member

What is the versioninfo()?

@k8lion
Copy link

k8lion commented Feb 4, 2021

Locally

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1* (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-7440HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-10.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = atom  -a
  JULIA_NUM_THREADS = 4

On cluster

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake-avx512)
Environment:
  JULIA_DEPOT_PATH = /tmpdir/maile/.julia

@k8lion
Copy link

k8lion commented Feb 4, 2021

My first errors were produced with the latest version release. On master locally, I get

libcublas.so.10: cannot open shared object file: No such file or directory

On master on the cluster, the errors are the same.

@DhairyaLGandhi
Copy link
Member

That looks like an issue with the local CUDA setup. We should really just setup lazy artifacts to make these errors go away entirely.

@PerezHz
Copy link

PerezHz commented Jul 12, 2021

Hit the same issue (I think) on a GPU machine with Julia 1.6.1 and a fresh environment:

julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, broadwell)

(@v1.6) pkg> st
      Status `~/.julia/environments/v1.6/Project.toml`
  [052768ef] CUDA v3.3.3
  [587475ba] Flux v0.12.4
  [7073ff75] IJulia v1.23.2
  [6a2ea274] Torch v0.1.2

julia> using CUDA; CUDA.versioninfo()
CUDA toolkit 11.3.1, artifact installation
CUDA driver 11.2.0
NVIDIA driver 460.73.1

Libraries: 
- CUBLAS: 11.5.1
- CURAND: 10.2.4
- CUFFT: 10.4.2
- CUSOLVER: 11.1.2
- CUSPARSE: 11.6.0
- CUPTI: 14.0.0
- NVML: 11.0.0+460.73.1
- CUDNN: 8.20.0 (for CUDA 11.3.0)
- CUTENSOR: 1.3.0 (for CUDA 11.2.0)

Toolchain:
- Julia: 1.6.1
- LLVM: 11.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80

1 device:
  0: Tesla T4 (sm_75, 14.414 GiB / 14.756 GiB available)

julia> using Torch
[ Info: Precompiling Torch [6a2ea274-3061-11ea-0d63-ff850051a295]
ERROR: LoadError: InitError: could not load library "/home/jupyter/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libdoeye_caml.so"
libcublas.so.10: cannot open shared object file: No such file or directory
Stacktrace:
  [1] dlopen(s::String, flags::UInt32; throw_error::Bool)
    @ Base.Libc.Libdl ./libdl.jl:114
  [2] dlopen (repeats 2 times)
    @ ./libdl.jl:114 [inlined]
  [3] __init__()
    @ Torch_jll ~/.julia/packages/Torch_jll/sFQc0/src/wrappers/x86_64-linux-gnu-cxx11.jl:57
  [4] _include_from_serialized(path::String, depmods::Vector{Any})
    @ Base ./loading.jl:674
  [5] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String)
    @ Base ./loading.jl:760
  [6] _require(pkg::Base.PkgId)
    @ Base ./loading.jl:998
  [7] require(uuidkey::Base.PkgId)
    @ Base ./loading.jl:914
  [8] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:901
  [9] include
    @ ./Base.jl:386 [inlined]
 [10] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt64}}, source::Nothing)
    @ Base ./loading.jl:1213
 [11] top-level scope
    @ none:1
 [12] eval
    @ ./boot.jl:360 [inlined]
 [13] eval(x::Expr)
    @ Base.MainInclude ./client.jl:446
 [14] top-level scope
    @ none:1
during initialization of module Torch_jll
in expression starting at /home/jupyter/.julia/packages/Torch/fIKJf/src/Torch.jl:1
ERROR: Failed to precompile Torch [6a2ea274-3061-11ea-0d63-ff850051a295] to /home/jupyter/.julia/compiled/v1.6/Torch/jl_Yw2dNx.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] compilecache(pkg::Base.PkgId, path::String, internal_stderr::Base.TTY, internal_stdout::Base.TTY)
   @ Base ./loading.jl:1360
 [3] compilecache(pkg::Base.PkgId, path::String)
   @ Base ./loading.jl:1306
 [4] _require(pkg::Base.PkgId)
   @ Base ./loading.jl:1021
 [5] require(uuidkey::Base.PkgId)
   @ Base ./loading.jl:914
 [6] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:901
 [7] top-level scope
   @ ~/.julia/packages/CUDA/02Kjq/src/initialization.jl:52

is there any recommended workaround?

@LeeLizuoLiu
Copy link

For this issue, one workaround you could try is to link the cuda library by
ln -s ~/path/to/libcublas.so.10 /home/jupyter/.julia/artifacts/d6ce2ca09ab00964151aaeae71179deb8f9800d1/lib/libcublas.so.10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants