Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull grant and grantee data from HSES on a schedule #248

Merged
merged 31 commits into from
Jan 13, 2021
Merged

Conversation

kryswisnaskas
Copy link
Collaborator

@kryswisnaskas kryswisnaskas commented Jan 8, 2021

Description of change
Adding a cron job, which is part of the server and runs everyday at midnights.
The job downloads data from HSES (in production env), extracts the zip archive into "temp" directory. Parses the xml data and populates or updates the SmartHub db with downloaded data.

How to test
Make sure the tests pass.
The imported data can be seen on sandbox in the database using the following:

  • open SSH tunnel: cf connect-to-service -no-client tta-smarthub-sandbox ttahub-sandbox
  • use your favorite postgres client with the parameters listed (if can't connect with localhost, use 127.0.0.1)
  • verify that there are grantees and grants in tables

(bonus test) additionally, to verify the download:

  • modify one of the HSES data download environment variables:
  • HSES_DATA_FILE_URL, HSES_DATA_USERNAME, HSES_DATA_PASSWORD
  • modify the cron's schedule to '*/5 * * * *' in the app.js (run every 5 minutes instead of at midnight)
  • deploy a branch to sandbox
  • observe that there will be an error message

Issue(s)

Checklist

  • Meets issue criteria
  • Code tested
  • [n/a] Meets accessibility standards (WCAG 2.1 Levels A, AA)
  • [n/a] Documentation updated

Copy link
Contributor

@rahearn rahearn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kryswisnaskas, good stuff. My biggest concern is the mixing of explicit IDs with auto-increment columns. As we saw with the user seeders in dev and sandbox doing that will likely result in duplicate IDs where we didn't mean to have them.

.circleci/config.yml Show resolved Hide resolved
src/app.js Outdated
const timezone = 'America/New_York';

// tmp schedule for testing
const schedule = '*/25 * * * *';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update to the real schedule. I'm also curious how this will work in production. Production will have multiple instances running (2 at first), but we want to make sure that this is only running on one of them.

See the last section on CF_INSTANCE_INDEX of cloud.gov's docs for information on limiting to the first instance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, this slipped by. I'll take a look at the link. Thanks.

src/lib/updateGrantsGrantees.js Show resolved Hide resolved
src/lib/updateGrantsGrantees.js Show resolved Hide resolved
src/lib/updateGrantsGrantees.js Show resolved Hide resolved
tools/importPlanGoals.test.js Show resolved Hide resolved
});

it('should import or update grants', async () => {
const grantsBefore = await Grantee.findAll();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to check the count of Grant before the test?

Suggested change
const grantsBefore = await Grantee.findAll();
const grantsBefore = await Grant.findAll();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Update README and .env.example
Copy link
Contributor

@rahearn rahearn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff, Krys. There are just two minor things I'd like to change before merge, both with suggestions below.

Comment on lines 390 to 392
hses_data_file_url: HSES_DATA_FILE_URL
hses_data_username: HSES_DATA_USERNAME
hses_data_password: HSES_DATA_PASSWORD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wish I had caught this earlier. We have one endpoint for dev/sandbox/staging, and another for production, so the prod variables need to be different than what we use for staging.

Suggested change
hses_data_file_url: HSES_DATA_FILE_URL
hses_data_username: HSES_DATA_USERNAME
hses_data_password: HSES_DATA_PASSWORD
hses_data_file_url: PROD_HSES_DATA_FILE_URL
hses_data_username: PROD_HSES_DATA_USERNAME
hses_data_password: PROD_HSES_DATA_PASSWORD

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. Cut and paste omission.

grantId = dbGrant.id;
currentGranteeId = dbGrant.granteeId;
// }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create job to pull Grantee Name and Grant ID data from HSES into Smart Hub database
5 participants