Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Destinations CDK: better integration tests #45113

Open
wants to merge 1 commit into
base: edgao/buffering_output_consumer_tracks_message_consumption
Choose a base branch
from

Conversation

edgao
Copy link
Contributor

@edgao edgao commented Sep 3, 2024

Implement the test framework, write a minimal set of base tests, and implement those tests for destination-e2e-test. I stuck with our existing abstract class + concrete per-destination implementation strategy, because:

  1. tests are opt-out
    1. like I mentioned in last (?) week's meeting - micronaut makes running the base class kind of wonky, so I think it's still best practice to have the override testFoo() { super.testFoo() } declarations. But even without that, we still get the test case.
  2. it gives us an ok way to run individual test cases from inside intellij

general review guide:

  1. DestinationMessage.kt has some simple convenience constructors
  2. Start with IntegrationTest.kt, which is where the meat of the PR lives
  3. Take a look at DestinationDataDumper, DestinationCleaner, ExpectedRecordMapper, and NameMapper - these are interfaces that destinations can/should implement. DataDumper doesn't have a noop implementation because every destination must implement it; everything else is optional.
  4. Check out DestinationProcess, which is how we launch the connector via micronaut (and eventually Docker). It's basically just a wrapper around CliRunner.
  5. CheckIntegrationTest + BasicFunctionalityIntegrationTest show how to write some basic tests.
  6. Then skim through all the destination-e2e-test code - it's hopefully pretty self-explanatory after everything else in this PR

Copy link

vercel bot commented Sep 3, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Sep 19, 2024 9:59pm

@@ -55,16 +55,16 @@ allprojects {
sourceCompatibility = JavaVersion.VERSION_21
targetCompatibility = JavaVersion.VERSION_21
compileJava {
options.compilerArgs += ["-Werror", "-Xlint:all,-serial,-processing"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert this before merging, this was just driving me nuts locally

import org.junit.jupiter.api.Disabled;
import org.junit.jupiter.api.Test;

public class TestingSilentDestinationAcceptanceTest extends DestinationAcceptanceTest {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check what these two classes are doing, but will revive these tests before merging this PR

@edgao edgao force-pushed the edgao/cdk_integration_tests branch from 70b9a9c to 6384e77 Compare September 5, 2024 23:18
command,
config = config,
catalog = catalog,
// TODO is this really the right way to achieve this?
Copy link
Contributor Author

@edgao edgao Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@johnny-schmidt before I go too far down this path - this (+the related plumbing code in CliRunner + AirbyteConnectorRunner) seems like a nicer way to inject custom beans to the destination? I.e. this way, test authors don't need to define a new InputStreamFactory + do property dancing

but lmk if you have a different idea for doing this

(this does have the slightly unfortunate result that in E2EDestination, we have to do .run(args = args) instead of .run(*args), b/c of varargs being a whiny child)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the idea is that we pass in config & catalog explicitly, would it make more sense to provide the input data explicitly also? Just wrap it in an serializing InputStream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maaaybe, though I'm leaning towards no. There are tests that want to interleave messages and test logic, a la

sendSomeRecords()
verifySomething()
sendMoreRecords()
endSync()
verifySomethingElse()

At which point I'd rather wrap it in the destination process via the sendMessage function, rather than exposing the pipes to callers? I'm imagining something along these lines, which doesn't feel great?

val pipe = PipedOutputStream()
val pipeInputStream = PipedInputStream(pipe)
val dest = DestinationProcess(pipeInputStream)
messages1.each { pipe.write(serialize(it)) }
verifySomething()
messages2.each { pipe.write(serialize(it)) }
// flush all messages
pipe.close()
// this will magically cause the destination to shutdown
pipeInputStream.close()

plus I think we get most of the benefit by providing a utility method wrapping the destination process?

fun runSync(factory, config, catalog, messages) {
  val dest = factory.start(config, catalog)
  messages.each { dest.sendMessage(it) }
  return dest.readMessages()
}

@edgao edgao force-pushed the edgao/buffering_output_consumer_tracks_message_consumption branch from 0250ca4 to 50234f4 Compare September 16, 2024 23:48
@edgao edgao force-pushed the edgao/buffering_output_consumer_tracks_message_consumption branch from 50234f4 to 28207e7 Compare September 17, 2024 16:15
@edgao edgao force-pushed the edgao/cdk_integration_tests branch 3 times, most recently from f8e49cf to 4cdec03 Compare September 17, 2024 16:35
}

@Test
open fun testBasicWrite() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test case is mostly here to (a) be a quick smoke test on our state handling, and (b) show the typical interaction with DestinationProcess. Most tests we'll actually write don't need any of the state stuff, and would look more like:

runSync(config, stream, listOf(records...))
dumpAndDiffRecords(...)
// (potentially more syncs, if we're e.g. testing refreshes)

@edgao edgao marked this pull request as ready for review September 17, 2024 16:45
@edgao edgao requested a review from a team as a code owner September 17, 2024 16:45
@edgao edgao force-pushed the edgao/buffering_output_consumer_tracks_message_consumption branch from 28207e7 to 1508fa6 Compare September 17, 2024 17:09
@edgao edgao requested a review from a team as a code owner September 17, 2024 17:09
@edgao edgao force-pushed the edgao/cdk_integration_tests branch 2 times, most recently from c2f4b23 to 30b24fb Compare September 17, 2024 21:45
@@ -1,4 +1,5 @@
plugins {
id 'airbyte-java-connector'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a hack. We get the integrationTestJava task from this plugin. https://github.com/airbytehq/airbyte-internal-issues/issues/9864 will fix.

@edgao edgao force-pushed the edgao/buffering_output_consumer_tracks_message_consumption branch from 1508fa6 to a30248e Compare September 19, 2024 21:49
@edgao edgao force-pushed the edgao/cdk_integration_tests branch 2 times, most recently from c36924e to cd7a833 Compare September 19, 2024 21:53
@edgao edgao force-pushed the edgao/buffering_output_consumer_tracks_message_consumption branch from a30248e to 2b8def0 Compare September 19, 2024 21:56
@edgao edgao force-pushed the edgao/buffering_output_consumer_tracks_message_consumption branch from 2b8def0 to e905aed Compare September 19, 2024 21:59
LocalDateTime.ofInstant(Instant.now(), ZoneOffset.UTC)
.format(DateTimeFormatter.ofPattern("YYYYMMDD"))
// stream name doesn't need to be randomized, only the namespace.
val randomizedNamespace = "test$timestampString$randomSuffix"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that we never test the null namespace case?


package io.airbyte.cdk.test.util

fun interface DestinationCleaner {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dq, why is the cleaner its own thing and not an entry point into the process?

Copy link
Contributor

@johnny-schmidt johnny-schmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks solid to me. If you want to merge it down into e2e I'll start working on the final productionalization.

I think overall we'll want to iterate on these interfaces (mine and yours) and work out how we want to handle configs, injection in general, mapping to and from schemas and records, all that. But we can learn as we go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues CDK Connector Development Kit connectors/destination/e2e-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants