Data Driven Workflows - Workflow Specification #147

Tyler-Keith-Thompson · 2021-09-29T03:01:35Z

Tyler-Keith-Thompson
Sep 29, 2021
Maintainer

Our milestone calls out the desire for data-driven workflows. This proposal walks through what the specification can be/should be.

What is the Specification Describing?

Ultimately, data-driven workflows describe several important things:

The order of items in a workflow
Optionally, the persistence of every item
Optionally, the presentation type of every item
Optionally, codable arguments that can be used to launch the workflow

Out of Scope

A description of a view (like HTML would be)
A factory method to use to create a workflow item (for example, if a view can be created from codable data)
Data to pass between items in a workflow
Logic to decide whether a FlowRepresentable should load
Specific identifiers for which workflow is being described, this can be handled by consumers for now
Any ability to create custom FlowRepresentables from data
Additional modifiers to apply to a workflow
Artificial Sentience
Quantum Singularities

Valid Specs

A specification (in any format) should be considered valid if:

The specification matches any of the valid formats SwiftCurrent accepts
The specification has a valid version number compatible with the version of SwiftCurrent in use
The FlowRepresentables listed actually exist in the codebase
The FlowRepresentable.WorkflowOutput of a given item matches the FlowRepresentable.WorkflowInput of the next item
The data used to launch the workflow matches the FlowRepresentable.WorkflowInput of the first item
The persistence of each FlowRepresentable is one of the valid FlowPersistence types
The presentation type (or launch style) of each FlowRepresentable is one of the valid LaunchStyle types

This has several implications:
In order to validate a given specification, we need access to the codebase that contains the FlowRepresentables defined. I can see 2 paths forward.

Run-time validation

We could simply perform validation of a workflow specification at runtime. This is necessary regardless of any other choices we make in this proposal. The most reasonable way to handle this is probably something like a throwing initializer for Workflow and/or WorkflowLauncher that takes in some decodable data. This means consumers are forced to deal with the possibility that a workflow can not be decoded. It falls in line nicely with other codable/decodable concepts and shouldn't cause too much consternation.

Pre-compile validation

This may be a later feature that we add. The idea here is that if someone is designing workflows from data they want to validate their specification. This is achievable if we create some kind of intermediary representation of FlowRepresentables in the codebase. This idea is hopelessly inspired by contract testing with pact.io. Imagine 2 responsible parties:

Consumers: A consumer is ultimately the codebase that'll consume a workflow specification and execute it
Providers: A provider provides the specification to consume

Consumers

Consumers can start by running a SwiftCurrent provided tool. This probably makes the most sense as a CLI utility. That could use SwiftSyntax to look through the AST output of a given Swift application and find all FlowRepresentable types in the codebase. This means that it may not be able to find statically linked symbols. The limitation of statically linked symbols needs investigation. Even if true, it's not that much of a concern as the vast majority of SwiftCurrent users will have access to the code such that the AST explorer could do its thing.

After exploring the AST and discovering all FlowRepresentables in the codebase the CLI could output some intermediary format. Bonus points if that format is not something we create, for example, it might be able to just give literal AST output in JSON form. Whenever a provider needs to validate their data they'd need that JSON file.

Providers

Providers would require some intermediary representation of available FlowRepresentables from consumers. They would also create one specification per workflow, not one specification for all workflows. Those don't necessarily have to be different documents but the specification, as noted above, is just for one workflow.

The provider could run a linter on their data using the intermediary representation from consumers. The linter would follow the validation rules listed above. Providers would need updated documents from consumers any time new FlowRepresentables became available for use. It's then crucial that consumer representations are versioned, so providers have all the information they need.

Formats

It's first worth noting that there does not have to be a single format. It seems prudent to limit the number because each format is something that we have to maintain, but abstractions in Swift (and most languages, for that matter) mean that it makes little difference if the format is JSON, YAML, TOML, XML, Protobuf, UML, INI, or some other insane format we want to throw out. This proposal is going to walk through common formats, pros and cons, and make a recommendation on a per-format basis on whether we should adopt it for the workflow specification.

JSON

This is certainly an industry standard and the go-to format for most engineers.

Pros

Everybody and their dog knows how to serialize and deserialize JSON
Modern languages all have support for easy deserialization
It's not hard for developers to understand, nor is it hard for developers to edit
It's easy to add a version key

Cons

It is not great for configuration. It's certainly usable, but JSON wasn't designed for great human readability, it was designed to match JavaScript objects.
With sufficiently complex data JSON can become downright unreadable. By the time you're 4+ objects deep, it can be impossible to tell what you're describing without code folding

Adopt: Despite its flaws for workflow description, JSON is such an industry standard that we'd be foolish not to adopt it.

PlantUML

Okay, I know how insane this sounds. However, hear me out here. UML is great at describing sequences. Workflows are sequences. What if your documentation for describing flows in your apps could literally be used to create flows in your apps? I think this is worth investigating.

Pros

Standardized modeling language
Docs and code are bound together, meaning there's a really unique opportunity for documentation driven development
Your workflow specification is also a visualization

Cons

Not what UML is designed for
Describing persistence, launch styles, and launch data will require some creative thinking. NOTE: I believe we can accomplish this within the UML specification, if nothing else, with comments
Swift does not come with a built-in UML parser, it's a strange use case. What packages do exist are all about displaying visualizations, so really we would have to write a parser

Adopt: Look, I recognize how insane the idea is. However, right now the out-of-the-box thinking is really speaking to me. Documentation Driven Development is a great idea, it's just that manual efforts make it unreasonable sometimes. We can automate that in really interesting ways.

Note: I struggled with this for a while. I thought about writing Consider instead of Adopt. However, I'm asking @wwt/workflow-developers to consider the entire proposal, so that'd be an equivalent recommendation to Adopt. I'm not gonna die on the hill of "We need to do this." However, I'm not ready to dismiss it as an idea.

YAML

YAML is quickly becoming the go-to format for configuration. It focuses more on human readability and has more readily available tooling for non-developers than JSON.

Pros

Great for configuration
Scales better than JSON (You can have greater levels of nesting and it's more apparent what is being nested)
Human readable
Probably a no-brainer for folks that want to white-label an application

Cons

No standard library YAML parsing exists, so we either have to introduce a dependency or roll our own. YAMS looks like a great candidate
Not a very friendly format for servers to send to clients.
It's not very clear exactly where we'd add a version specifier (maybe using comments?)

Adopt: YAML feels like a given. The white-labeling crowd will probably be appreciative. This recommendation comes with an asterisk. If you can only adopt one format it should not be YAML. This is because it is not friendly for servers to send to clients.

Protobuf

Google's protobuf implementation is fantastic. You get safe, performant, and code-friendly serialization and deserialization.

Pros

Highly performant
Fantastic for sharing data models between different programming languages
1st party Apple support
Type-safe

Cons

Workflows are doubly linked lists, protobuf only has the concept of a "collection". That's true with all of the above formats too, but the key difference is that protobuf code-gens concrete models. So consumers would end up with models in code that we only want them to use for one very specific purpose.
Consumers would have to integrate the protobuf library into their projects as well as SwiftCurrent.

Reject: Protobuf is frankly amazing. I would welcome community contributions but I don't see enough benefit for us to provide 1st party support for it out of the box.

XML

XML may be an older format but it's perfectly positioned to describe workflows. It has the concept of nodes, it can self-references, you could easily describe a doubly linked list in XML.

Pros

Composition-based format that's close to perfect for describing virtually any kind of workflow, even if we changed the underlying type to a graph instead of a linked list
It's got versioning built right in, it's trivial to add our own version specifier
There is a library in every language imaginable for parsing XML since it's been around forever

Cons

It's got negative ramifications of being the "old" way of doing things
While every language has a parser, those parsers aren't actually very machine-friendly. Traversing nodes in XML can be cumbersome
It's not friendly for developers or non-developers to write

Reject: XML has some surprising advantages here, but once again it's probably not worth 1st party support. If for some reason, we get community interest I could be persuaded otherwise. Any community contributions would be welcomed.

TOML

TOML was designed to be an even easier YAML. It shares similarities with INI files but has a more standardized format and supports nesting. Ultimately it's just a big hashmap.

Pros

Human readable
Capable of creating collections to represent workflows and define all the details we need

Cons

No built-in standard library decoding
It's not very good with arbitrary data structures
We'd have to push the spec a little to describe workflow launch args
No clear place to put a version specifier
It's not as widespread as the other format options mentioned

Reject: I see no real benefits over YAML for workflow specifications. Just like the other rejections, community contributions would be welcomed. However, first-party support seems unnecessary.

Richard-Gist · 2021-09-29T14:07:27Z

Richard-Gist
Sep 29, 2021
Maintainer

SHORT VERSION I like the idea of using UML or some other existing specification used for class specification/relationship diagrams. I also agree that JSON is a given.

LONG VERSION
I like the UML approach. Something I've heard from consulting is a request at some random period for an architectural diagram and eventually a UML diagram. Having a UML diagram for defining the workflow can already help groups in that position already have a springboard to start from.

Additionally, I think a crappy way to define those persistence/presentation types, is simply defining those selections as individual enumerations and then defining those enums on the diagram. Example:

<<enumeration>>
Persistence.RemovedAfterProceeding
----
removedAfterProceeding

FR1
persistence: Persistence.RemovedAfterProceeding

That gets us to a point where you can define it on the diagram and potentially process it in the code. BUT I'm sure we'll find a better way.

4 replies

Richard-Gist Nov 19, 2021
Maintainer

I'm following up on this with an example of what the PlantUML would look like for one possible way of diagramming/processing. This is using a State Diagram to generate a workflow. It could be appropriate in the regards of a Workflow is somewhat an example of different state of things. I'm going to do a simple example below that can be generated at plantuml.com

First example crosses Core and UIKit based FlowRepresentables and can be seen here. Here is the code:

state CoreSwiftCurrentFR1 {
    state "FlowPersistence" as CoreSwiftCurrentFR1.FlowPersistence : removedAfterProceeding
}

state CoreSwiftCurrentFR2 {
    state "FlowPersistence" as CoreSwiftCurrentFR2.FlowPersistence : persistWhenSkipped
}

state UIKitFR3 {
    state "FlowPersistence" as UIKitFR3.FlowPersistence : removedAfterProceeding
    state "LaunchStyle" as UIKitFR3.LaunchStyle : navigationStack
}

state UIKitFR4 {
    state "FlowPersistence" as UIKitFR4.FlowPersistence : hiddenInitially
    state "LaunchStyle" as UIKitFR4.LaunchStyle {
        state "ModalStyle" as UIKitFR4.ModalStyle : fullScreen
    }
    UIKitFR4.LaunchStyle : modal
}

[*] --> CoreSwiftCurrentFR1
CoreSwiftCurrentFR1 --> CoreSwiftCurrentFR2
CoreSwiftCurrentFR2 --> UIKitFR3
UIKitFR3 --> UIKitFR4
UIKitFR4 --> DefaultedFR6
DefaultedFR6 --> [*]

Of note is that DefaultedFR6 is using the default settings, and as such does not need to be specifically defined. The assumption here is that every state name matches an existing FR in the system. Notice that these are not objects or classes, but instead varying levels of state. Additionally, this example is oriented vertically which should not be a problem.

Now for the second example. It can be seen here.

state CoreSwiftCurrentFR1.shouldLoad <<choice>>
CoreSwiftCurrentFR1.shouldLoad -> CoreSwiftCurrentFR1 : true
state CoreSwiftCurrentFR1 {
    state "FlowPersistence" as CoreSwiftCurrentFR1.FlowPersistence : removedAfterProceeding
}

state CoreSwiftCurrentFR2.shouldLoad <<choice>>
CoreSwiftCurrentFR2.shouldLoad -> CoreSwiftCurrentFR2 : true
state CoreSwiftCurrentFR2 {
    state "FlowPersistence" as CoreSwiftCurrentFR2.FlowPersistence : persistWhenSkipped
}

state UIKitFR3.shouldLoad <<choice>>
UIKitFR3.shouldLoad -> UIKitFR3 : true
state UIKitFR3 {
    state "FlowPersistence" as UIKitFR3.FlowPersistence : removedAfterProceeding
    state "LaunchStyle" as UIKitFR3.LaunchStyle : navigationStack
}

state UIKitFR4.shouldLoad <<choice>>
UIKitFR4.shouldLoad -> UIKitFR4 : true
state UIKitFR4 {
    state "FlowPersistence" as UIKitFR4.FlowPersistence : hiddenInitially
    state "LaunchStyle" as UIKitFR4.LaunchStyle {
        state "ModalStyle" as UIKitFR4.ModalStyle : fullScreen
    }
    UIKitFR4.LaunchStyle : modal
}

state DefaultedFR6.shouldLoad <<choice>>
DefaultedFR6.shouldLoad -> DefaultedFR6 : true

[*] --> CoreSwiftCurrentFR1.shouldLoad

CoreSwiftCurrentFR1.shouldLoad --> CoreSwiftCurrentFR2.shouldLoad : false
CoreSwiftCurrentFR1 --> CoreSwiftCurrentFR2.shouldLoad : proceedInWorkflow()

CoreSwiftCurrentFR2.shouldLoad --> UIKitFR3.shouldLoad : false
CoreSwiftCurrentFR2 --> UIKitFR3.shouldLoad : proceedInWorkflow()

UIKitFR3.shouldLoad --> UIKitFR4.shouldLoad : false
UIKitFR3 --> UIKitFR4.shouldLoad : proceedInWorkflow()

UIKitFR4.shouldLoad --> DefaultedFR6.shouldLoad : false
UIKitFR4 --> DefaultedFR6.shouldLoad : proceedInWorkflow()

DefaultedFR6.shouldLoad --> [*] : false
DefaultedFR6 --> [*] : proceedInWorkflow()

This one takes the first example but more faithfully adapts the flow of state to include the shouldLoad.. This doesn't particularly make sense from a consumption point of view but it helps with a diagramming view. Also note that I opted for a vertical connection into the shouldLoad and a horizontal path from shouldLoad to the FR.

I had also looked into other diagrams (seen in data-spike5) but I think this one holds as valid and readable. If I'm mistaken please let me know.

What I will say for my evaluation is that the shorthand version is doable (though prone to typos), but for diagramming and validity, the second example creates a better diagram of what is occurring and makes it clear that you are not guaranteed to visit an FR. You could update the graph to exclude the false line but there's no guarantee from the data being sent and the implementation of the FR that you will not hit the false scenario.

Richard-Gist Nov 19, 2021
Maintainer

For the first implementation on this, I would say we force conformance to the first example and then generate the second example when someone is looking for a diagram.

Tyler-Keith-Thompson Nov 19, 2021
Maintainer Author

This specification needs a version identifier. (I'll respond to the other comments later)

Richard-Gist Nov 19, 2021
Maintainer

We could use note as a way to provide overrides to a given part of the diagram. That feels UML-ish. As for the content of the comment, I'm not sure yet, but it could be JSON.

As for Versioning, we could simply have a floating state for Versioning. That would allow for later adding a minimum or maximum supported version for the workflow.

Tyler-Keith-Thompson · 2021-09-29T18:11:43Z

Tyler-Keith-Thompson
Sep 29, 2021
Maintainer Author

@wwt/workflow-developers The proposal for Data-Driven Workflow specifications is ready for review. Feel free to critique, add thoughts, features ideas, ask questions that the proposal didn't cover, and suggest changes.

0 replies

morganzellers · 2021-09-29T19:35:20Z

morganzellers
Sep 29, 2021

I like the idea of UML and also agree on JSON and YAML - those two feel like must-haves in the industry today.
Dropping SwiftPlantUML as a possible jumping-off point for our parser. They go all the way to a visual, but maybe their parsing can enlighten us. My biggest concern with data-driven, in general, is versioning, but that is a farther off problem to solve.

0 replies

Richard-Gist · 2021-11-22T20:25:20Z

Richard-Gist
Nov 22, 2021
Maintainer

I am starting a new answer to cover possible JSON schemas.

Here is my suggested schema:

{
  "$schema": "http://json-schema.org/draft-04/schema#",
  "type": "object",
  "properties": {
    "version": {
      "type": "string"
    },
    "sequence": {
      "type": "array",
      "items": [
        {
          "type": "object",
          "properties": {
            "flowRepresentableName": {
              "type": "string"
            },
            "flowPersistence": {
              "type": "object",
              "properties": {
                "style": {
                  "type": "string"
                }
              },
              "required": [ "style" ]
            },
            "launchStyle": {
              "type": "object",
              "properties": {
                "style": {
                  "type": "string"
                },
                "substyle": {
                  "type": "string"
                }
              },
              "required": [ "style" ]
            }
          },
          "required": [ "flowRepresentableName" ]
        }
      ]
    }
  },
  "required": [
    "version",
    "sequence"
  ]
}

This one defines the minimum amount of information needed to define a workflow. It has a version and a sequence of FlowRepresentables. It can be as simple as this:

{
    "version": "0.0.1",
    "sequence": [
        {
            "flowRepresentableName": "FR1"
        }
    ]
}

While simple, it allows for growth without breaking the existing schema. For example, because FlowPersistence is an object and not just a string value, you can define additional properties in FlowPersistence down the road. Perhaps you do something like this:

{
    "version": "0.0.1",
    "sequence": [
        {
            "flowRepresentableName": "FR1",
            "flowPersistence": {
                "type": "conditional",
                "condition": "args == 10"
            }
        }
    ]
}

While that condition may not be what we ultimately want, it allows us to figure that out without a breaking change to the schema.

That leads to a critical consideration for this schema. I wanted it so that this schema would continue to make sense across a potential Android or Web consumer and not be confusing in those contexts. Hopefully, it will make sense in any context on any platform.

1 reply

Richard-Gist Nov 22, 2021
Maintainer

Something I'm realizing is that after we create a JSON specification, to enable PlantUML as well, we may want to just have a converter between PlantUML and JSON. Not sure why that didn't occur to me earlier.

nickkaczmarek · 2022-02-07T20:25:45Z

nickkaczmarek
Feb 7, 2022

Capturing some discussion from @Tyler-Keith-Thompson and I that we could potentially use a build/run phase to validate the provided json would actually work with the code that you have. Not sure on specifics just yet, but having some ability to fail earlier than runtime would be nice.

0 replies

nickkaczmarek · 2022-02-11T18:57:02Z

nickkaczmarek
Feb 11, 2022

PlantUML seems very interesting. I am wondering how much we're willing to take on in regards to allowing it to be a data definition. For json and YAML we get kinda lucky and have a schema we can use to validate. We also get parsers pretty cheap. For PlantUML we'd need to write both a schema and a validator/parser. PlantUML seems to also enable a ton of features and I'm wondering how we might decide what to accept and what not to. Should we make a parser that is spec compliant or only enable the features we really need? Just noodling here before diving into trying to make a schema and parser.

1 reply

Tyler-Keith-Thompson Feb 14, 2022
Maintainer Author

It's a lower priority, especially since I don't think many will take advantage of it any time soon. That said I don't think we write our own schema validator. I think we merely document (well) what diagram type we chose, how it renders, and how it parses. Then we write a decoder and parser for it.

We may be able to use our JSON schema to ensure whatever our PlantUML spec looks like matches the functionality of JSON and YAML.

nickkaczmarek · 2022-02-15T15:16:32Z

nickkaczmarek
Feb 15, 2022

At the moment we validate this object with our schema with no issues:

{
      "schemaVersion": "v0.0.1",
      "sequence": [
        {
          "flowRepresentableName": "FR1"
        },
        {
          "flowRepresentableName": "FR2",
          "launchStyle": "modal",
          "flowPersistence": "removedAfterProceeding"
        },
        {
          "flowRepresentableName": {
            "watchOS": "FR3",
            "macOS": "FR3",
            "iOS": "FR3",
            "iPadOS": "FR3",
            "tvOS": "FR3",
            "android": "FRA3"
          },
          "launchStyle": {
            "watchOS": "modal",
            "macOS": "modal",
            "iOS": "modal",
            "iPadOS": "popover",
            "tvOS": "modal",
            "android": "widget"
          },
          "flowPersistence": {
            "watchOS": "removedAfterProceeding",
            "macOS": "removedAfterProceeding",
            "iOS": "removedAfterProceeding",
            "iPadOS": "removedAfterProceeding",
            "tvOS": "removedAfterProceeding",
            "android": "somethingElse"
          }
        },
        {
          "flowRepresentableName": {
            "*": "FR3",
            "android": "FRA3"
          },
          "launchStyle": {
            "*": "modal",
            "iPadOS": "popover",
            "android": "widget"
          },
          "flowPersistence": {
            "watchOS": "removedAfterProceeding",
            "macOS": "removedAfterProceeding",
            "iOS": "removedAfterProceeding",
            "iPadOS": "removedAfterProceeding",
            "tvOS": "removedAfterProceeding",
            "android": "somethingElse"
          }
        }
      ]
    }

But should we also enable something like this where we mix strings and objects:

In other words, should we enable something that allows mixing of strings and objects or should we enforce that a user chooses one or the other. I'm thinking we'd allow mixing, but would like to know what the others think.

9 replies

nickkaczmarek Feb 15, 2022

At the moment we aren't allowing mixing, based on the schema. I am wondering if we should. It is more work to get the schema to accept that style, but if that's something we want to do I can work on it.

morganzellers Feb 15, 2022

Flexibility was my main line of thinking

Tyler-Keith-Thompson Feb 15, 2022
Maintainer Author

Oh so the parser does it...but the schema validator doesn't? Yeah, that flexibility is a must. Otherwise we weirdly punish people for getting specific on a given flow representable by forcing the whole document that way.

nickkaczmarek Feb 15, 2022

@Tyler-Keith-Thompson @morganzellers I think those are both good points. I'll give it a shot. I was hoping to be able to validate the inner bits and we might lose that, but I think the flexibility in the greater structure is definitely doable. Hopefully we can get all of the above.

nickkaczmarek Feb 21, 2022

This latest version of the schema handles this mixing that we discussed above.

Data Driven Workflows - Workflow Specification #147

Tyler-Keith-Thompson Sep 29, 2021 Maintainer

What is the Specification Describing?

Out of Scope

Valid Specs

Run-time validation

Pre-compile validation

Consumers

Providers

Formats

JSON

Pros

Cons

PlantUML

Pros

Cons

YAML

Pros

Cons

Protobuf

Pros

Cons

XML

Pros

Cons

TOML

Pros

Cons

Replies: 7 comments · 15 replies

Richard-Gist Sep 29, 2021 Maintainer

Richard-Gist Nov 19, 2021 Maintainer

Richard-Gist Nov 19, 2021 Maintainer

Tyler-Keith-Thompson Nov 19, 2021 Maintainer Author

Richard-Gist Nov 19, 2021 Maintainer

Tyler-Keith-Thompson Sep 29, 2021 Maintainer Author

Richard-Gist Nov 22, 2021 Maintainer

Richard-Gist Nov 22, 2021 Maintainer

Tyler-Keith-Thompson Feb 14, 2022 Maintainer Author

Tyler-Keith-Thompson Feb 15, 2022 Maintainer Author

Tyler-Keith-Thompson
Sep 29, 2021
Maintainer

Replies: 7 comments 15 replies

Richard-Gist
Sep 29, 2021
Maintainer

Richard-Gist Nov 19, 2021
Maintainer

Richard-Gist Nov 19, 2021
Maintainer

Tyler-Keith-Thompson Nov 19, 2021
Maintainer Author

Richard-Gist Nov 19, 2021
Maintainer

Tyler-Keith-Thompson
Sep 29, 2021
Maintainer Author

Richard-Gist
Nov 22, 2021
Maintainer

Richard-Gist Nov 22, 2021
Maintainer

Tyler-Keith-Thompson Feb 14, 2022
Maintainer Author

Tyler-Keith-Thompson Feb 15, 2022
Maintainer Author