When running learning_curve() with parallel processing (n_jobs > 1) it wrongly returns fit_times and score_times as sums of their respective duration across all parallel jobs of _fit_and_score() rather than a meaningful, let's say, average.

This wrong aggregation seems to be caused by _aggregate_score_dicts() which is unpacking all parallel job times as a single vector.

https://github.com/scikit-learn/scikit-learn/blob/36958fb240fbe435673a9e3c52e769f01f36bec0/sklearn/model_selection/_validation.py#L1575

One solution could be to average fit and score times per train_size.

0
© 2022 pullanswer.com - All rights reserved.