Debugging Workflows

LittleHorse provides robust debugging capabilities to help you quickly identify and resolve any issues that may arise in your workflows. In this guide, we'll explore these debugging features by intentionally creating and resolving an error.

warning

This tutorial assumes you have completed the Your First WfSpec guide.

Understanding Task Failures

When a TaskRun fails in LittleHorse, the WfRun execution pauses at that point. This gives you time to:

Investigate what went wrong
Fix the underlying issue
Retry the failed TaskRun execution where or fix the issue and rerun the entire WfRun execution.

Let's see this in action by modifying our greeting TaskDef to fail under specific conditions.

Creating a Test Failure

Let's update our GreetingWorker to throw an exception when greeting a specific name:

src/main/java/io/littlehorse/quickstart/Greeter.java
package io.littlehorse.quickstart;

import io.littlehorse.sdk.worker.LHTaskMethod;

public class Greeter {
    @LHTaskMethod("greet")
    public String greet(String name) {
        if (name.toLowerCase().equals("anakin")) {
            throw new RuntimeException("I don't like sand!");
        }
        return "Hello, " + name + "!";
    }
}

info

In real applications, task failures might occur due to network issues, database errors, or other system problems. We're using a contrived example here to demonstrate the debugging process.

Triggering and Observing the Failure

Let's run our workflow with the problematic input:

lhctl run quickstart name anakin

Using the Dashboard

The LittleHorse Dashboard provides a visual way to inspect failures:

Open http://localhost:8080
Click on the quickstart WfSpec
Click on the WfRun that was most recently created
Click on the failed TaskRun
Click inspect TaskRun to see:
- The full stack trace
- Input variables
- Failure timestamp
- Node execution history

A Failed TaskNode in LH Dashboard

Using the CLI

You can also inspect failures using lhctl:

# Get the workflow run status
lhctl get wfRun <WORKFLOW_RUN_ID>

# Get detailed information about the failed node
lhctl get nodeRun <WORKFLOW_RUN_ID> <NODE_RUN_ID>

Rescuing Failed Workflows

Once you've fixed the underlying issue (in our case, maybe we decide that we do like sand after all), you can rescue the workflow:

# Retry the failed node
lhctl rescue <WORKFLOW_RUN_ID> <THREAD_RUN_NUMBER>

# Verify the workflow is now proceeding
lhctl get wfRun <WORKFLOW_RUN_ID>

info

The rescue command tells LittleHorse to retry the failed node. This is particularly useful when the failure was due to a temporary issue or after you've fixed a bug in your task worker.

Wrapping Up

You've learned how to:

Identify failed WfRuns in both the dashboard and CLI
Inspect failure details
Rescue failed WfRuns after fixing issues

If you haven't already:

Join the LittleHorse Slack Community
Give us a star on GitHub
Check out our documentation

Understanding Task Failures​

Creating a Test Failure​

Triggering and Observing the Failure​

Using the Dashboard​

Using the CLI​

Rescuing Failed Workflows​

Wrapping Up​