It appears that something around backedge tracking broke somewhat recently (I'm assuming with pkgimages, but haven't bisected yet). In particular, changing Core.Compiler no longer gets picked up by Cthulhu. To reproduce, apply a patch to base/compiler, e.g.

diff --git a/base/compiler/abstractinterpretation.jl b/base/compiler/abstractinterpretation.jl
index b3e18a6ee7..fc8e2d1aee 100644
--- a/base/compiler/abstractinterpretation.jl
+++ b/base/compiler/abstractinterpretation.jl
@@ -547,6 +547,8 @@ function abstract_call_method(interp::AbstractInterpreter, method::Method, @nosp
         return MethodCallResult(Any, false, false, nothing, Effects())
+    @show sig
     # Limit argument type tuple growth of functions:
     # look through the parents list to see if there's a call to the same method
     # and from the same method.

This will not show up in Cthulhu:

julia> using Revise

julia> Revise.track(Core.Compiler)

julia> using Cthulhu

julia> @descend optimize=false sin(1.0)
sin(x::T) where T<:Union{Float32, Float64} @ Base.Math special/trig.jl:29

However, if I force-reevaluate do_typeinf!, it does show up:

julia> Core.eval(Cthulhu, quote
       function do_typeinf!(interp::AbstractInterpreter, mi::MethodInstance)
           result = InferenceResult(mi)
           # we may want to handle the case when `InferenceState(...)` returns `nothing`,
           # which indicates code generation of a `@generated` has been failed,
           # and show it in the UI in some way ?
           # branch on
           frame = @static hasmethod(InferenceState, (InferenceResult,Symbol,AbstractInterpreter)) ?
                   InferenceState(result, #=cache=# :global, interp)::InferenceState :
                   InferenceState(result, #=cached=# true, interp)::InferenceState
           CC.typeinf(interp, frame)
           return nothing

julia> @descend optimize=false sin(1.);
:sig = Tuple{typeof(Base.abs), Float64}
:sig = Tuple{Type{Float64}, Base.Irrational{:π}}
:sig = Tuple{typeof(Base.:(/)), Float64, Int64}
:sig = Tuple{typeof(Base.promote), Float64, Int64}
:sig = Tuple{typeof(Base._promote), Float64, Int64}

This suggests that there is some cached version of do_typeinf! that fails to notice the revision of the Core.Compiler method. @timholy any ideas - is this known?


If I do info = info_cachefile("Cthulhu") and then look at info.edges, I don't see an entry for Cthulhu.do_typeinf! - shouldn't there be one?

julia> using Revise

julia> Revise.track(Core.Compiler)

julia> using Cthulhu

julia> mi = Cthulhu.get_specialization(Tuple{typeof(gcd), Int, Int})
MethodInstance for gcd(::Int64, ::Int64)

julia> interp = Cthulhu.CthulhuInterpreter();

julia> Cthulhu.do_typeinf!(interp, mi)

julia> Core.Compiler.typeinf(interp, Core.Compiler.InferenceState(Core.Compiler.InferenceResult(mi), :global, interp))
:sig = Tuple{typeof(Base.:(==)), Int64, Int64}
:sig = Tuple{typeof(Base.Checked.checked_abs), Int64}
:sig = Tuple{typeof(Base.:(==)), Int64, Int64}
:sig = Tuple{typeof(Base.Checked.checked_abs), Int64}
:sig = Tuple{typeof(Base._gcd), Int64, Int64}
:sig = Tuple{typeof(Base.signbit), Int64}
:sig = Tuple{typeof(Base.__throw_gcd_overflow), Int64, Int64}

I think one piece of the puzzle is that Cthulhu sets compile=min for its methods.


Aha, I think I have the smoking gun. We collect the edges in

        jl_prepare_serialization_data(mod_array, newly_inferred, jl_worklist_key(worklist), &extext_methods, &new_specializations, &method_roots_list, &ext_targets, &edges);

but then only call

            *_native_data = jl_precompile_worklist(worklist, extext_methods, new_specializations);

afterwards, so if we do extra inference there, those edges are missed.


Extra inference seems forbidden there, since we should have already prepped the method list for serialization and decided the total content to include. What changed?


By what mechanism is it supposed to be forbidden?

#0  jl_type_infer ([email protected]=0x7f96486a8380, world=33942, [email protected]=0) at /home/keno/julia/src/gf.c:278
#1  0x00007f9651e5305c in jl_ci_cache_lookup (cgparams=..., src_out=0x7ffd4ab60c08, ci_out=<synthetic pointer>, 
    world=<optimized out>, mi=0x7f96486a8380) at /home/keno/julia/src/aotcompile.cpp:250
#2  jl_create_native_impl (methods=0x7f964a99f4c0, llvmmod=<optimized out>, 
    cgparams=0x7f9651f07e60 <jl_default_cgparams>, _policy=0, _imaging_mode=<optimized out>, 
    _external_linkage=<optimized out>) at /home/keno/julia/src/aotcompile.cpp:335
#3  0x00007f96523d631c in jl_precompile_ ([email protected]=1, m=<optimized out>, m=<optimized out>)
    at /home/keno/julia/src/precompile_utils.c:254
#4  0x00007f96523e3b55 in jl_precompile_worklist (extext_methods=<optimized out>, extext_methods=<optimized out>, 
    new_specializations=<optimized out>, new_specializations=<optimized out>, worklist=0x7f9648c268f0)
    at /home/keno/julia/src/precompile_utils.c:303
#5  ijl_create_system_image (_native_data=<optimized out>, worklist=<optimized out>, emit_split=<optimized out>, 
    s=<optimized out>, z=<optimized out>, udeps=<optimized out>, srctextpos=<optimized out>)
    at /home/keno/julia/src/staticdata.c:2624

It is not supposed to be calling anything Julia related (finalizers are also disabled here, and threads are supposed to be stopped), since it will corrupt the state, though nothing checks it to detect that sort of error. Sounds like someone broke it recently then?


This diff might work?

diff --git a/src/precompile_utils.c b/src/precompile_utils.c
index f251d00f76..6154a9ced7 100644
--- a/src/precompile_utils.c
+++ b/src/precompile_utils.c
@@ -188,7 +188,7 @@ static int precompile_enq_specialization_(jl_method_instance_t *mi, void *closur
                 (jl_ir_inlining_cost((jl_array_t*)inferred) == UINT16_MAX)) {
                 do_compile = 1;
-            else if (jl_atomic_load_relaxed(&codeinst->invoke) != NULL || jl_atomic_load_relaxed(&codeinst->precompile)) {
+            else if (jl_atomic_load_relaxed(&codeinst->precompile)) {
                 do_compile = 1;

do_compile seems likely to corrupt the precompile state prepared by jl_prepare_serialization_data, if it gets called at all


I've PR'd that diff as #48054 since if fixes my immediate issue. If there's a large fix to be had, you might need to discuss with @vchuravy .


xref analysis in

© 2022 - All rights reserved.