Debugging Workflows
LittleHorse provides robust debugging capabilities to help you quickly identify and resolve any issues that may arise in your workflows. In this guide, we'll explore these debugging features by intentionally creating and resolving an error.
This tutorial assumes you have completed the Your First WfSpec
guide.
Understanding Task Failures
When a TaskRun
fails in LittleHorse, the WfRun
execution pauses at that point. This gives you time to:
- Investigate what went wrong
- Fix the underlying issue
- Retry the failed
TaskRun
execution where or fix the issue and rerun the entireWfRun
execution.
Let's see this in action by modifying our greeting TaskDef
to fail under specific conditions.
Creating a Test Failure
Let's update our GreetingWorker
to throw an exception when greeting a specific name:
package io.littlehorse.quickstart;
import io.littlehorse.sdk.worker.LHTaskMethod;
public class Greeter {
@LHTaskMethod("greet")
public String greet(String name) {
if (name.toLowerCase().equals("anakin")) {
throw new RuntimeException("I don't like sand!");
}
return "Hello, " + name + "!";
}
}
In real applications, task failures might occur due to network issues, database errors, or other system problems. We're using a contrived example here to demonstrate the debugging process.
Triggering and Observing the Failure
Let's run our workflow with the problematic input:
lhctl run quickstart name anakin
Using the Dashboard
The LittleHorse Dashboard provides a visual way to inspect failures:
- Open
http://localhost:8080
- Click on the
quickstart
WfSpec
- Click on the
WfRun
that was most recently created - Click on the failed
TaskRun
- Click inspect
TaskRun
to see:- The full stack trace
- Input variables
- Failure timestamp
- Node execution history
A Failed TaskNode in LH Dashboard
Using the CLI
You can also inspect failures using lhctl
:
# Get the workflow run status
lhctl get wfRun <WORKFLOW_RUN_ID>
# Get detailed information about the failed node
lhctl get nodeRun <WORKFLOW_RUN_ID> <NODE_RUN_ID>
Rescuing Failed Workflows
Once you've fixed the underlying issue (in our case, maybe we decide that we do like sand after all), you can rescue the workflow:
# Retry the failed node
lhctl rescue <WORKFLOW_RUN_ID> <THREAD_RUN_NUMBER>
# Verify the workflow is now proceeding
lhctl get wfRun <WORKFLOW_RUN_ID>
The rescue
command tells LittleHorse to retry the failed node. This is particularly useful when the failure was due to a temporary issue or after you've fixed a bug in your task worker.
Wrapping Up
You've learned how to:
- Identify failed
WfRun
s in both the dashboard and CLI - Inspect failure details
- Rescue failed
WfRun
s after fixing issues
If you haven't already:
- Join the LittleHorse Slack Community
- Give us a star on GitHub
- Check out our documentation