@nalimilan we have the following related problems in the core design of DataFrames.jl. What do you think we should do about it?

Problem 1. view with no columns

julia> df = DataFrame(a=1:3, b=4:6)
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     3      6

julia> sdf = @view df[:, 2:1]
0×0 SubDataFrame

julia> sdf[[true, true, true], :]
0×0 DataFrame

julia> sdf[Bool[], :]
ERROR: BoundsError: attempt to access 3-element Base.OneTo{Int64} at index [0-element Vector{Bool}]

The issue: view allows to set its :rows field to whatever parent allows, but with 0 columns it should have 0 rows.

Problem 2. dropping columns with view

julia> df = DataFrame(a=1:3, b=4:6)
3×2 DataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     3      6

julia> sdf = @view df[:, :]
3×2 SubDataFrame
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4
   2 │     2      5
   3 │     3      6

julia> dfr = df[1, :]
DataFrameRow
 Row │ a      b
     │ Int64  Int64
─────┼──────────────
   1 │     1      4

julia> select!(df)
0×0 DataFrame

julia> sdf[:, :]
ERROR: BoundsError: attempt to access 0×0 DataFrame at index [1:3, 1:0]

julia> dfr
Error showing value of type DataFrameRow{DataFrame, DataFrames.Index}:
ERROR: BoundsError: attempt to access 0×0 DataFrame at index [[1], 1:0]

The issue here is that if we use : as column selector it is a valid operation to drop/add columns to a data frame. However, if we move past 0-columns case we might even change number of rows in a data frame.

If something is not clear please let me know. In general I was not sure how to best handle it (there are several possible options, but in general it shows a hole in the design so I was not sure what is best).

0

Results of some additional thinking:

Regarding the second issue (changing set of columns in a parent of a view with : as column selector). While it is irritating, this is strictly speaking not an issue. If we drop all columns in the parent we change its number of rows - and this is disallowed when working with views. So we can leave things as is

Regarding the first issue (0-column view of a data frame with some rows). I was thinking of two solutions:

  • changing the rows argument when sub data frame is created to be empty;
  • changing the implementation of view and getindex so that they work correctly in this case.

The first (fixing rows) is easier to do, but it will give an incorrect parentindices result (i.e. you cannot recover the row indices used when creating a SubDataFrame. Though it is not strictly required that parentindices should allow for this. Also fixing rows would be technically breaking.

So I am leaning to changing view and getindex implementations, but keeping rows field untouched. What do you think?

0

Makes sense. Is there any drawback to that solution?

0

I do not see any (except for complexity). See https://github.com/JuliaData/DataFrames.jl/pull/3273 for an implementation.

0
© 2022 pullanswer.com - All rights reserved.