Skip to content

Fix task progress percentage with failed tasks #5868

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

bentsherman
Copy link
Member

Spun off from #4588 to provide a more focused change

Currently, when a task fails, it is included in the "complete" and "total" task count per process. As a result, the percentage actually increases when tasks fail, even though no progress has been made. It also distorts the total number of "real tasks" vs "task attempts" in the final log.

This PR removes the "failed" count from the complete/total task counts, and adds the "ignored" count in its place. The rationale is that every failed task will either be retried or ignored, or the run will terminate and it won't matter. Of these failures, only the ignored tasks should count towards progress. Retries that succeed will be marked as succeeded and count towards progress, otherwise they shouldn't.

To see the difference, run this pipeline with and without these changes:

process foo {
  errorStrategy { sleep(5000) ; task.exitStatus==1 && task.attempt==1 ? 'retry' : 'ignore' }

  input:
  val x

  script:
  """
  sleep ${x * 3}
  exit 1
  """
}

workflow {
  Channel.of(1, 2, 3) | foo
}

This pipeline executes three tasks that fail, are retried, fail again, and ignored, with some delays to help you see the progress.

Without this PR, the process will appear to make progress as tasks fail and are retried. The final task count will be 6, even though there were only 3 "real tasks" to complete.

With this PR, the progress will remain at 0 until the tasks are ignored towards the end. The final task count will be 3, with the retried and ignored count on the side.

@jorgee if you agree with these changes, please approve so that I can merge. Let me know if you have any questions

@bentsherman bentsherman requested a review from jorgee March 10, 2025 17:06
Copy link

netlify bot commented Mar 10, 2025

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit a74a20f
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/67dd8d2b0997060009c1778c
😎 Deploy Preview https://deploy-preview-5868--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@pditommaso
Copy link
Member

We need to make sure this is not going to impact Platform progress (also considering current redesign)

@bentsherman
Copy link
Member Author

It looks like getTotalCount() and getCompleteCount() are not sent to platform, in fact they are explicitly excluded:

@Override
protected Map<?, ?> getObjectProperties(Object object) {
final result = super.getObjectProperties(object)
if( object instanceof ProgressRecord ) {
result.remove('hash')
result.remove('errored')
result.remove('completedCount')
result.remove('totalCount')
result.remove('taskName')
}
return result
}

Which makes sense because platform needs the raw numbers anyway

@jorgee
Copy link
Contributor

jorgee commented Mar 11, 2025

Reviewing the PR I have seen something similar is happening in the ReportObserver. I was testing if there was some side effect when running -with-report and I have seen it is reporting failed+ignored 9 tasks (3 ignored and 6 failed).

image

To be coherent with platform report it should be reporting the failed. It is not a side effect of the PR, it is also happening in previous versions. Can we fix it in this PR or should I open a new issue?

@bentsherman
Copy link
Member Author

Let's fix it in this PR

@jorgee
Copy link
Contributor

jorgee commented Mar 13, 2025

@bentsherman, I have pushed the fix for the report stats

@bentsherman bentsherman added this to the 25.04.0 milestone Mar 20, 2025
@bentsherman
Copy link
Member Author

To fix the progress bar, I replaced the failed count with an "effective" failed count, which is just the failed minus the ignored/retried. So the percentages for succeeded + cached + ignored + "effective" failed should be correct

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants