Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Objects show as "updated" and are redownloaded even when unchanged #53

Open
noelforte opened this issue Jun 2, 2023 · 6 comments
Open
Labels

Comments

@noelforte
Copy link

noelforte commented Jun 2, 2023

Environment
NodeJS v18.16.0
macOS 13.4 Ventura

Steps to reproduce
Sample code for how I'm invoking s3-sync-client (with sensitive values stripped):

Show
// Initialize env
import 'dotenv/config';

// Load internal modules
import path from 'node:path';
import { env, exit } from 'node:process';

// Load external modules
import { S3SyncClient } from 's3-sync-client';

// Initialize client
const { sync } = new S3SyncClient({
	region: ***,
	endpoint: ***,
	forcePathStyle: false,
	credentials: {
		accessKeyId: ***,
		secretAccessKey: ***,
	},
});

const results = await sync(
	`s3://my-bucket/path/to/directory`,
	'output',
	{
		del: true,
	}
);

console.log(results);

Expected result
Items that are unchanged from the remote to the local system should not be recopied.

Actual result
Even after an initial successful sync to the local filesystem, s3-sync-client continues to redownload files that haven't changed, incurring additional bandwidth charges.

Here's a screen capture of the network requests going across:

Show

Screen Recording 2023-06-01 at 8 14 17 PM

And the resulting output:

Show
{
  created: [],
  updated: [
    BucketObject {
      id: 'dim-gunger-UO2hOHLq9Y0-unsplash.jpg',
      size: 1667804,
      lastModified: 1685662029901,
      isExcluded: false,
      bucket: '***',
      key: 'test/dim-gunger-UO2hOHLq9Y0-unsplash.jpg'
    },
    BucketObject {
      id: 'luka-verc-D-ChPtXJhXA-unsplash.jpg',
      size: 2448935,
      lastModified: 1685662029901,
      isExcluded: false,
      bucket: '***',
      key: 'test/luka-verc-D-ChPtXJhXA-unsplash.jpg'
    },
    BucketObject {
      id: 'planet-volumes-6tI9Fk5p4bo-unsplash.jpg',
      size: 385869,
      lastModified: 1685662029923,
      isExcluded: false,
      bucket: '***',
      key: 'test/planet-volumes-6tI9Fk5p4bo-unsplash.jpg'
    },
    BucketObject {
      id: 'the-cleveland-museum-of-art-AiD3Pkwmtt0-unsplash.jpg',
      size: 3881833,
      lastModified: 1685662030480,
      isExcluded: false,
      bucket: '***',
      key: 'test/the-cleveland-museum-of-art-AiD3Pkwmtt0-unsplash.jpg'
    },
    BucketObject {
      id: 'yannick-apollon-rYXkqDZxfaw-unsplash.jpg',
      size: 13356953,
      lastModified: 1685662030513,
      isExcluded: false,
      bucket: '***',
      key: 'test/yannick-apollon-rYXkqDZxfaw-unsplash.jpg'
    }
  ],
  deleted: []
}

Other items of note:

  • I thought this could be related to using iDrive e2 (which is S3 compatible), but the same issue exists on Backblaze too.
  • The time returned by s3-sync-client when performing the diff is the same time as what's on the filesystem and on the WebUI for the S3 storage services so any drift differences, if they exist, aren't visible at least to my eyes as an end-user.

Happy to provide any other relevant details!

@jeanbmar
Copy link
Owner

jeanbmar commented Jun 2, 2023

Thank you for the very detailed report.

Can you please run the following code and paste the outputs of the two calls here:

import { S3SyncClient, ListLocalObjectsCommand, ListBucketObjectsCommand } from 's3-sync-client';
const client = new S3SyncClient({ /* your config */ });

console.log(
  await client.send(
    new ListLocalObjectsCommand({
      directory: 'output',
    })
  )
);

console.log(
  await client.send(
    new ListBucketObjectsCommand({
      bucket: 'my-bucket',
      prefix: 'path/to/directory',
    })
  )
);

The diff code for updates is pretty simple:

if (
  sourceObject.size !== targetObject.size ||
  (options?.sizeOnly !== true &&
    sourceObject.lastModified > targetObject.lastModified)
) {
  updated.push(sourceObject);
}

Let's see if the issue comes from values or maybe value types.

@noelforte
Copy link
Author

Sure thing, here's the local object output, truncated for brevity:

[
  LocalObject {
    id: 'test-obj-a.jpg',
    size: 1667804,
    lastModified: 1685735624000,
    isExcluded: false,
    path: 'output/test-obj-a.jpg'
  },
  LocalObject {
    id: 'test-obj.b.jpg',
    size: 385869,
    lastModified: 1685735634000,
    isExcluded: false,
    path: 'output/test-obj.b.jpg'
  }
]

and the bucket object output:

[
  BucketObject {
    id: 'test/test-obj-a.jpg',
    size: 1667804,
    lastModified: 1685735624935,
    isExcluded: false,
    bucket: '...',
    key: 'test/test-obj-a.jpg'
  },
  BucketObject {
    id: 'test/test-obj-b.jpg',
    size: 385869,
    lastModified: 1685735766762,
    isExcluded: false,
    bucket: '...',
    key: 'test/test-obj-b.jpg'
  }
]

Looks like the lastModified values of the local files are returning rounded down to 1000 seconds.

@jeanbmar
Copy link
Owner

jeanbmar commented Jun 2, 2023

I've made tests on S3, and it seems that AWS doesn't store milliseconds for the LastModified field.

Ref: aws/aws-cli#5369

My test with official AWS SDK commands:

await s3Client.send(
  new PutObjectCommand({
    Bucket: BUCKET_2,
    Key: 'def/jkl/xmoj',
    Body: Buffer.from('0x1234', 'hex'),
  })
);

console.log(
  (
    await s3Client.send(
      new ListObjectsV2Command({
        Bucket: BUCKET_2,
        Prefix: 'def/jkl/xmoj',
      })
    )
  ).Contents.map(({ LastModified }) => LastModified.getTime())
);

// => [ 1685740748000 ]

console.log(
  (
    await s3Client.send(
      new GetObjectCommand({
        Bucket: BUCKET_2,
        Key: 'def/jkl/xmoj',
      })
    )
  ).LastModified.getTime()
);

// => 1685740748000

Can you run the last two commands on test/test-obj-a.jpg ? s3Client here is S3Client instance from the official SDK.
I have the feeling your provider (or the official AWS SDK) might return inconsistent timestamps between ListObjectsV2Command and GetObjectCommand, which would explain the issue.

@noelforte
Copy link
Author

noelforte commented Jun 3, 2023

You are correct, that is the case. Here's the output:

console.log(
  (
    await clientS3.send(
      new ListObjectsV2Command({
        Bucket: env.S3_BUCKET,
        Prefix: 'test/test-obj-a.jpg',
      })
    )
  ).Contents.map(({ LastModified }) => LastModified.getTime())
);

// => [ 1685735624935 ]

console.log(
  (
    await clientS3.send(
      new GetObjectCommand({
        Bucket: env.S3_BUCKET,
        Key: 'test/test-obj-a.jpg',
      })
    )
  ).LastModified.getTime()
);

// => 1685735624000

Is there anything that can be done to work around that by disregarding the milliseconds if they are returned?

@jeanbmar
Copy link
Owner

jeanbmar commented Jun 3, 2023

I'm not sure we can safely round or truncate values. If you look at test-obj.b.jpg in #53 (comment), we get 1685735634000 and 1685735766762 dates while sizes are the same.

I would suggest opening a ticket with the providers and in the meantime using the sizeOnly: true option when doing sync. Size comparison should be good enough.

@noelforte
Copy link
Author

Whoops! That was a mistake on my part, I think I changed something in #53 (comment) that caused the times to shift (test-obj-a vs test-obj.a), so that's where the inconsistency came from. After the last test I did in #53 (comment) and made sure that the local objects and remote objects were identical, this is the output:

[
  LocalObject {
    id: 'test-obj-a.jpg',
    size: 1667804,
    lastModified: 1685735624000,
    isExcluded: false,
    path: 'output/test-obj-a.jpg'
  },
  LocalObject {
    id: 'test-obj-b.jpg',
    size: 385869,
    lastModified: 1685735766000,
    isExcluded: false,
    path: 'output/test-obj-b.jpg'
  }
]
[
  BucketObject {
    id: 'test/test-obj-a.jpg',
    size: 1667804,
    lastModified: 1685735624935,
    isExcluded: false,
    bucket: 'my-bucket',
    key: 'test/test-obj-a.jpg'
  },
  BucketObject {
    id: 'test/test-obj-b.jpg',
    size: 385869,
    lastModified: 1685735766762,
    isExcluded: false,
    bucket: 'my-bucket',
    key: 'test/test-obj-b.jpg'
  }
]

As you can see, the timestamps for each object are exactly the same apart from the milliseconds, so it doesn't appear to be an issue with the provider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants