Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add protocol methods to protocol source #17

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

cgardens
Copy link
Contributor

@cgardens cgardens commented Mar 31, 2023

Here's a first-pass at adding the protocol methods to the protocol repo. I could not find a way to make JSONSchema to work nicely. Definitely open to other ways of expressing this. If we like this approach we can clean it up and move forward.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 31, 2023

Hey there and thank you for opening this pull request! 👋🏼

We require pull request titles to follow the Conventional Commits specification and it looks like your proposed title needs to be adjusted.

Details:

No release type found in pull request title "add protocol methods to protocol source". Add a prefix to indicate what kind of release this pull request corresponds to. For reference, see https://www.conventionalcommits.org/

Available types:
 - feat: A new feature
 - fix: A bug fix
 - docs: Documentation only changes
 - style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
 - refactor: A code change that neither fixes a bug nor adds a feature
 - perf: A code change that improves performance
 - test: Adding missing tests or correcting existing tests
 - build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
 - ci: Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
 - chore: Other changes that don't modify src or test files
 - revert: Reverts a previous commit


**interface of both source and destination**
```
spec() -> Stream<AirbyteConnectorSpecification>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth making these objects links into the protocol.yaml file so it is easy to lookup the AirbyteConnectorSpecification, etc or having an appendix at the bottom with links to them?

Copy link
Contributor

@jdpgrailsdev jdpgrailsdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, but I think the miss in the current implementation is that you can't determine the actual --flag names from the information provided. Some notes on that below.

**source only**
```
discover(Config) -> AirbyteCatalog
read(Config, ConfiguredAirbyteCatalog, State) -> Stream<AirbyteRecordMessage | AirbyteStateMessage | AirbyteControlMessage>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method-style representation is a bit misleading. The flag argument isn't --ConfiguredAirbyteCatalog it is --catalog. We also don't explain if are passing the object itself (e.g. stringified JSON) or a file path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a way to keep this method-like signature could be:

read(config -> File<Config>, catalog -> File<ConfiguredAirbyteCatalog>, state -> File<State>) -> Stream<...>

... but now I'm just making things up.

I think JSONSchema for this might work better for this:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "additionalProperties": true // this is important now
  "required": ["config", "catalog"] // showing that state is optional
  "arguments": {
    "config": { "type": "file_path", "$ref": Config.yaml },
    "catalog": { "type": "file_path", "$ref": ConfiguredAirbyteCatalog.yaml },
    "state": { "type": "file_path", "$ref": State.yaml },
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a real READ command for reference:

docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-faker:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think the JSON schema version is a little clearer, maybe there's a way to designate STDIN/STDOUT parameters from arguments to the method call?

Comment on lines +12 to +13

In addition to the return types mentioned below, all methods can return the following message types: `AirbyteLogMessage | AirbyteTraceMessage`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I missed this note at first

* If an input parameter has no name, then it is passed via STDIN.

In addition to the return types mentioned below, all methods can return the following message types: `AirbyteLogMessage | AirbyteTraceMessage`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note about additional arguments should be ignored and not validated should be here somewhere

## Method Interfaces

We describe these interfaces in pseudocode for clarity. Clarifications on the pseudocode semantics:
* Any `Stream~ that is mentioned as input arg, is passed to the docker contained via STDIN.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the ~ is throwing me off here, I was expecting it to mean something, maybe just empty brackets or filled with a generic Type would be clearer?

Stream<>
Stream<T>
Stream<...>

* Any `Stream~ that is mentioned as input arg, is passed to the docker contained via STDIN.
* All other parameters are passed in as command line args (e.g. --config <path to config file>).
* Each input parameter is described as its type (as defined in airbyte_protocol.yml and the name of the parameter).
* If an input parameter has no name, then it is passed via STDIN.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this holds up, or I am misunderstanding it.

Reading this I expected all signatures to be like

methodName(argName: ArgType)
// or
methodName(ArgType argName)

And then there would be a distinction for args passed via stdin which would only be type

**source only**
```
discover(Config) -> AirbyteCatalog
read(Config, ConfiguredAirbyteCatalog, State) -> Stream<AirbyteRecordMessage | AirbyteStateMessage | AirbyteControlMessage>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think the JSON schema version is a little clearer, maybe there's a way to designate STDIN/STDOUT parameters from arguments to the method call?


We describe these interfaces in pseudocode for clarity. Clarifications on the pseudocode semantics:
* Any `Stream~ that is mentioned as input arg, is passed to the docker contained via STDIN.
* All other parameters are passed in as command line args (e.g. --config <path to config file>).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a controversial opinion - I think trying to represent both the stdin/stdout values and the method parameters in the same step is confusing.

Maybe stating that the method returns an I/O stream and define what that I/O accepts and returns as a secondary step?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I also think this is a problem in our other docs describing these methods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants