problem: Re-run all jobs button is visible, but clicking it leads to errors #12434
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Does your problem still exist on the latest Forgejo version?
Yes, the problem still exists (tested locally with the latest development version)
About your usage of Forgejo
Forgejo development
Problem description
After job one (see workflow below) has failed and while job two is still running, the button Re-run all jobs is visible because the failed job causes Forgejo to mark the workflow as failed. But clicking on Re-run all jobs leads to errors because one job is still running and the code that triggers the rerun tries to restart the running job, which isn't possible and leads to errors.
Potential workarounds
No response
Forgejo Version
No response
Other details about your environment (software names and versions)
Fedora 44
Solutions
Accepted solutions to address this problem will go here
I'm unsure what the correct behaviour should be. Should Re-run all jobs be hidden until all jobs have completed or should the running job be cancelled? I think cancelling the running job would make sense.
In the state where one job has failed, and other jobs are still running, the typical thing I'd like to do is "Re-run failed jobs" rather than "Re-run all jobs". 🤔 Leave the currently running jobs alone and just be the same as clicking the Rerun button on each individual failed job.
If the button remains "Re-run all jobs", having the button be hidden in this case would make more sense to me than cancelling the running job. I think that I'd click a "Re-run all jobs" button, which would throw out in-progress work, in pretty rare scenarios... like if the first job failed because of a permission problem, and I corrected the permission problem and wanted to restart the work. It's plausible, but pretty niche. More likely, I think, to be a mistake clicking that button if it cancels current work.
@mfenniak wrote in https://codeberg.org/forgejo/forgejo/issues/12434#issuecomment-14394218:
Forgejo doesn't have a Re-run failed jobs button, does it? Do we want one?
Re-run all jobs is bound to the state of the run. It's visible as long the run is done, invisible otherwise. As soon as a job fails, the run is marked as failed, i.e., it's done. I'm not sure about that mechanic. I wouldn't be surprised if that would trigger notifications which is too early in my book. I think the correct behaviour would be to delay that decision until all jobs have completed. That by itself would make the issue disappear.
Otherwise, the button would have to be bound to an aggregated state that is built from the states of all jobs.
@aahlenst wrote in https://codeberg.org/forgejo/forgejo/issues/12434#issuecomment-14397635:
You're correct, it doesn't currently. I'd want one more than I want "Re-run all jobs"... 🤷 I like the idea. But I'm not sure if it's a situation where you'd have both, or dynamically choose one or the other.
Yes, for "Re-run all jobs", this logic makes sense to me.