Reproducer:
julia> using DataFrames, Test, Random
julia> const ≅ = isequal
isequal (generic function with 44 methods)
julia> y = Any[1, missing, missing, 2, 4]
5-element Vector{Any}:
1
missing
missing
2
4
julia> x = 1:length(y)
1:5
julia> df = DataFrame(x=x, y1=y, y2=reverse(y))
5×3 DataFrame
Row │ x y1 y2
│ Int64 Any Any
─────┼─────────────────────────
1 │ 1 1 4
2 │ 2 missing 2
3 │ 3 missing missing
4 │ 4 2 missing
5 │ 5 4 1
julia> gd = groupby(df, :x)
GroupedDataFrame with 5 groups based on key: x
First Group (1 row): x = 1
Row │ x y1 y2
│ Int64 Any Any
─────┼─────────────────
1 │ 1 1 4
⋮
Last Group (1 row): x = 5
Row │ x y1 y2
│ Int64 Any Any
─────┼─────────────────
1 │ 5 4 1
julia> combine(gd, [:x, :y1] => ((x, y) -> (sleep((x == [5])/10); y[1])) => :y1,
[:x, :y2] => ((x, y) -> (sleep((x == [5])/10); y[end])) => :y2)
5×3 DataFrame
Row │ x y1 y2
│ Int64 Int64? Int64?
─────┼─────────────────────────
1 │ 1 1 4
2 │ 2 missing 2
3 │ 3 missing missing
4 │ 4 2 missing
5 │ 5 4 1
julia> combine(gd, [:x, :y1] => ((x, y) -> (sleep((x == [5])/10); y[1])) => :y1,
[:x, :y2] => ((x, y) -> (sleep((x == [5])/10); y[end])) => :y2)
5×3 DataFrame
Row │ x y1 y2
│ Int64 Int64? Int64?
─────┼─────────────────────────
1 │ 1 1 4
2 │ 2 missing 2
3 │ 3 missing missing
4 │ 4 2 missing
5 │ 5 4 1
julia> combine(gd, [:x, :y1] => ((x, y) -> (sleep((x == [5])/10); y[1])) => :y1,
[:x, :y2] => ((x, y) -> (sleep((x == [5])/10); y[end])) => :y2)
And the operation does not terminate (note that I run the same operation three times). Most likely we have some race condition in handling of multi-threading that got exposed on Julia nightly.
For the issue to show up we need to pass two operations to combine
. If one is passed things work OK.
OK - I have narrowed it down. The issue is with sleep
:
julia> @sync begin
[email protected] sleep(0.01)
end
Task (done) @0x000001e7a9a71750
julia> @sync begin
[email protected] sleep(0.01)
end
Task (done) @0x000001e7fa5c45e0
julia> @sync begin
[email protected] sleep(0.01)
end # hangs