Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribute "slim" version of the connector #792

Open
enzo-cappa opened this issue Feb 27, 2024 · 5 comments
Open

Distribute "slim" version of the connector #792

enzo-cappa opened this issue Feb 27, 2024 · 5 comments

Comments

@enzo-cappa
Copy link
Contributor

enzo-cappa commented Feb 27, 2024

The current distribution of the connector is an uber jar that has all the dependencies. However, some of those dependencies are not needed in all cases, specially in production systems. For example:

  • Binaries are distributed for JNI libraries (zstd) for environments that are not used in production systems (i.e. Darwin, very uncommon linux architectures).
  • Libraries that may be distributed as part of Kafka Connect, like commons-compress. This is specially problematic as some dependencies end up being outdated and vulnerable (i.e. commons-compress is affected by SNYK-JAVA-ORGAPACHECOMMONS-6254297)

Would it be possible to distribute a slim version of the connector besides the current one? Just a JAR with fundamental dependencies. Furthermore, it would be better to distribute it as a zip/tar.gz with jar files inside, like Debezium does (see the different types and classifiers at https://repo1.maven.org/maven2/io/debezium/debezium-connector-postgres/2.5.1.Final/). This last part would make it easier to exclude those JARs in case is needed (for example, to force a version bump in case a 0-day vuln is discovered in a dependency).

@enzo-cappa
Copy link
Contributor Author

I just realized that this repo depends on the shaded snowflake-ingest, which means that there are several dependencies that are being duplicated version of https://github.com/snowflakedb/snowflake-ingest-java/tree/master?tab=readme-ov-file#jar-versions

@enzo-cappa
Copy link
Contributor Author

Another finding: the JDBC driver is also distributed as a fat jar. Furthermore, both the JDBC driver and the Ingest SDK require different distributions for to be FIPS compliant, which are not used in this connector. Which make me assume that this connector is not FIPS compliant, and can not be as long as the uber/shadowed JARs are used.

@sfc-gh-gjachimko
Copy link
Contributor

@enzo-cappa I'm very sorry for late reply. I'll add internal ticket to track this issue and discuss it. We shall se if we have some space for improvements here.

@simonepm
Copy link

Kudos to this, at the moment the Connector v2.3.0 supports Kafka 3.7 and Confluent 7.6.

Do not know if it is an overkill, but in order to make this version to run with Kafka 3.5 and Confluent 7.5 we had to re-build the Jar from the source code, otherwise 'NoSuchMethodError' will pop around.

It would be great to have the possibility to include just the stripped down JAR version as a dependency and include in the classpath a different version of its Kafka and Confluent dependencies for broader compatibility.

@simonepm
Copy link

simonepm commented Aug 29, 2024

I found a quick-and-dirty solution that does not require to re-compile the whole JAR and on the other side allows for a slim import to be used with a different set of the dependencies versions:

implementation ('com.snowflake:snowflake-kafka-connector:2.4.0') {
    transitive = false
    // exclude all original dependencies groups:
    exclude group: 'org.bouncycastle'
    exclude group: 'org.apache.kafka'
    exclude group: 'net.snowflake'
    exclude group: 'org.apache.avro'
    exclude group: 'org.apache.commons'
    exclude group: 'com.fasterxml.jackson.core'
    exclude group: 'io.confluent'
    exclude group: 'io.dropwizard.metrics'
    exclude group: 'com.google.guava'
    exclude group: 'com.google.protobuf'
    exclude group: 'dev.failsafe'
    exclude group: 'org.slf4j'
  }
  // import all original dependencies at the version required:
  implementation 'org.bouncycastle:bcpkix-fips:1.0.7'
  implementation 'org.apache.kafka:connect-api:3.5.2'
  implementation 'org.apache.kafka:kafka-clients:3.5.2'
  implementation 'net.snowflake:snowflake-jdbc:3.18.0'
  implementation 'net.snowflake:snowflake-ingest-sdk:2.2.0'
  implementation 'org.apache.avro:avro:1.11.3'
  implementation 'org.apache.commons:commons-compress:1.26.2'
  implementation 'com.fasterxml.jackson.core:jackson-core:2.17.2'
  implementation 'com.fasterxml.jackson.core:jackson-databind:2.17.2'
  implementation 'io.confluent:kafka-schema-registry-client:7.5.5'
  implementation 'io.confluent:kafka-avro-serializer:7.5.5'
  implementation 'io.confluent:kafka-connect-avro-converter:7.5.5'
  implementation 'io.confluent:kafka-schema-rules:7.5.5'
  implementation 'io.confluent:kafka-schema-registry-client-encryption:7.5.5'
  implementation 'io.dropwizard.metrics:metrics-core:4.2.26'
  implementation 'io.dropwizard.metrics:metrics-jmx:4.2.3'
  implementation 'com.google.guava:guava:32.0.1-jre'
  implementation 'com.google.protobuf:protobuf-java:3.25.4'
  implementation 'com.google.protobuf:protobuf-java-util:3.25.4'
  implementation 'dev.failsafe:failsafe:3.3.2'
  implementation 'org.slf4j:slf4j-api:1.7.36'
}

We pick the dependency imports from the original pom.xml file and change the incompatible versions (e.g. Confluent and Kafka libraries) by excluding all the dependency groups and re-importing them at the version needed.

Doing just transitive = false was not working for me, I guess because we are dealing with a fat-JAR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants