Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further S3 Support Umbrella Issue #114

Open
4 of 8 tasks
sjperkins opened this issue Sep 21, 2023 · 8 comments
Open
4 of 8 tasks

Further S3 Support Umbrella Issue #114

sjperkins opened this issue Sep 21, 2023 · 8 comments

Comments

@sjperkins
Copy link
Contributor

sjperkins commented Sep 21, 2023

This issue tracks remaining issues from the TODO list here: #91 (comment)

Related:

KVStore

Testing

Authorisation

Logging

@laramiel
Copy link
Collaborator

laramiel commented Nov 1, 2023

As of 81aef76 we should be in a place where the "//tensorstore/kvstore/s3:localstack_test" will run using a hermetic moto (https://github.com/getmoto/moto) server.

We choose to use moto over localstack since it appears to have a smaller and somewhat less conflicting dependency tree.

@sjperkins
Copy link
Contributor Author

sjperkins commented Nov 2, 2023

Thanks, have merged master into #119.

In terms of retry functionality, the following looks like a suitable reference implementation:

I might have time to look at this in the second half of November, or, early December, unless there's something else that might be useful to look at first?

@laramiel
Copy link
Collaborator

I have linked "s3" into the default kvstore set; no special changes are required now.

@sjperkins
Copy link
Contributor Author

Hello, just dropping a note to say that my next window of opportunity for s3 work will be after March 2024,

@laramiel
Copy link
Collaborator

laramiel commented Feb 8, 2024

There was, at least, one issue related to auth on S3. I think that we may want to investigate using the AWS sdk (https://github.com/aws/aws-sdk-cpp/tree/main) for, at a minimum, auth.

  • For auth, we'd need to check that it could be integrated into our requests flow.
  • We'd need bazel and bazel_to_cmake configs to work; it would be possible to start with the tensorflow-io (https://github.com/tensorflow/io) bazel files; there appears to be ~5 dependencies required.
  • There does appear to be a mismatch in apis regarding cancellation; that seems less important for now.
  • Verify that it is not "too large".

@sjperkins
Copy link
Contributor Author

sjperkins commented Feb 9, 2024

There was, at least, one issue related to auth on S3. I think that we may want to investigate using the AWS sdk (https://github.com/aws/aws-sdk-cpp/tree/main) for, at a minimum, auth.

Thanks for the headsup. I assume #135 is the issue? I just want to check, from scanning the issue there was a obvious, but easily fixed argument order bug (apologies looks like that was on me). Has this resolved the problem or is there an orthogonal issue related to incorporating KMS keys in the AWS4 signature?

If so, I can see why deferring to the SDK would be appealing -- the auth surface area is probably quite large.

Perhaps, one way to think about the heaviness of aws-sdk-cpp is to install the appropriate parts using vcpkg:

$ ./vcpkg install aws-sdk-cpp[core,s3,s3-encryption,kms]
...
$ cd installed/x64-linunx/lib
$ du -hsc lib*
424K	libaws-c-auth.a
124K	libaws-c-cal.a
496K	libaws-c-common.a
16K	libaws-c-compression.a
152K	libaws-c-event-stream.a
44K	libaws-checksums.a
668K	libaws-c-http.a
484K	libaws-c-io.a
540K	libaws-c-mqtt.a
3.4M	libaws-cpp-sdk-core.a
1.5M	libaws-cpp-sdk-kms.a
4.2M	libaws-cpp-sdk-s3.a
576K	libaws-cpp-sdk-s3-encryption.a
1.3M	libaws-crt-cpp.a
312K	libaws-c-s3.a
164K	libaws-c-sdkutils.a
11M	libcrypto.a
1.4M	libcurl.a
2.4M	libs2n.a
1.3M	libssl.a
144K	libz.a
30M	total

It may be possible to build the aws-sdk-cpp with boringssl instead of openssl as a dependency. Then, purely AWS code seems to be about 15M in total

$ du -hsc libaws-c*
424K	libaws-c-auth.a
124K	libaws-c-cal.a
496K	libaws-c-common.a
16K	libaws-c-compression.a
152K	libaws-c-event-stream.a
44K	libaws-checksums.a
668K	libaws-c-http.a
484K	libaws-c-io.a
540K	libaws-c-mqtt.a
3.4M	libaws-cpp-sdk-core.a
1.5M	libaws-cpp-sdk-kms.a
4.2M	libaws-cpp-sdk-s3.a
576K	libaws-cpp-sdk-s3-encryption.a
1.3M	libaws-crt-cpp.a
312K	libaws-c-s3.a
164K	libaws-c-sdkutils.a
15M	total

Maybe a more accurate estimate could be obtained by building a test binary that produces an Authorization header using AWSAuthV4Signer (for instance).

I may not get to this before end March/start April due to other commitments.

@sjperkins
Copy link
Contributor Author

If you think a videocon would be useful my contact details are here https://github.com/ratt-ru/arcae/blob/8c323116105d811f9718f258a09b52b68e7a8150/pyproject.toml#L5.

@laramiel
Copy link
Collaborator

laramiel commented Feb 9, 2024

I think cribbing from this is a good starting point.

The dependencies are

That would be enough to get auth, probably. Then we could theoretically use any of the AWS auth services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants