Skip to content

Database Infrastructure

Richard Pardo-Figueroa edited this page Oct 3, 2022 · 6 revisions

Why PostgreSQL?

When considering the ideal database platform to be used for FAM, the topics of database systems, cloud agnostic, high availability, technical debt, technology maturity and maintainability came up.

For FAM, the ideal infrastructure that would support the database had to meet the following requirements:

  • Database Systems: The infrastructure should support relational databases and, with that, and Spatial datasets. Some RDBMS that were taken into consideration were Oracle, MSSQL Server and PostgreSQL. Some RDBMS as MySQL, Sybase and Couch did not/partially meet those requirements.
  • Cloud Agnostic: At the time when this page was created, Azure MSSQL was not supported by the OCIO's Pathfinder team, which practically ruled out the use of MSSQL. Implementing a Windows server cluster in AWS was a complex and onerous task and perhaps impossible to do in that platform without extensive research.
  • High availability: Systems such Oracle, MSSQL and PostgreSQL have a proven record of providing the database of high tolerance and availability. With the exception of MSSQL Server, -which relies on the Operating System for HA and provides less complex setup-, Oracle and PostgreSQL require a great knowledge for setting up and maintaining a high availability configuration.
  • Technology maturity: The chosen database system must have an ample support base and a high degree of adoption. It should also be atop of most other database systems in terms of features, with many years of existence in the market.
  • Maintainability: The database system should be maintainable by technical development teams as well as database administrators without considerable effort in training.
  • Technical debt: Development teams as well as the FAM technical architects expressed that reducing technical debt was paramount. With that, exploring other, less common database systems was discarded. With the high cost of Oracle licensing across NRS and the very few transactional databases in MSSQL within the sector, PostgreSQL quickly became the database system of choice.

Several teams within the sector have been deploying applications in Openshift using the PostgreSQL RDBMS. Its great popularity, similarities with the Oracle database and the wide support available within the open source community added to the reasons to use PostgreSQL.

Last but not least, the PostGIS -the PostgreSQL library to handle GIS data and store it in the database-, it is also widely supported, mature and also used by some teams within the BC Government. With the extensive amount of spatial data within the NRS, using PostgreSQL for FAM would allow future teams and the FSA program to administer one single database platform, with considerably less upfront and future costs.

Why Aurora RDS for PostgreSQL?

Over the years, database administration has become an onerous task within the NRS Ministries, due to technology changes, staff retirements and the amount of data hosted in many, many databases.

As part of the adoption of PostgreSQL, the team came across with Amazon Aurora RDS for PostgreSQL. Amazon Aurora offers a server-less service that allows teams to deploy highly available MSSQL, Oracle or PostgreSQL databases, with a minimal intervention by a Database Administrator, which alleviates the pressure on that team. Some benefits of this platform (as implemented within FAM):

  • The server-less implementation, provides a "vanilla" PostgreSQL database so there is no vendor lock-in. Should the team decide to move the FAM's data assets to another PostgreSQL database an export of the database can be obtained using the traditional PostgreSQL binaries.
  • A server-less RDS Postgres database supports the addition of the PostGIS library via Amazon Web Services Cloud Development Kit (Amazon CDK). Although a complex implementation, it is possible and fully supported by AWS.
  • Database backups ("snapshots" in the AWS lingo) are created automatically for the database, with a retention of seven days, which can be customized.
  • Automatic Database snapshots older than 7 days are automatically removed.
  • Snapshots are stored within the RDS service, so there is no need for a Database Administrator to grant (or be granted) access to a S3 bucket or any other type of storage, which accelerates the velocity in which a backup can be accessed.
  • It is possible to create a manual snapshot at any time and the retention policies do not apply to them.
  • An Amazon Aurora RDS database uses a Pay-as-you-use model, so the FSA program is charged only when the database is used.
  • An Amazon Aurora RDS database is highly available by default, reducing the need for setting up any sort of high availability mechanism (eliminates the need of Patroni) and its respective indirect maintenance and monitoring costs.
  • By definition, an Aurora RDS database is auto-scalable
  • An Aurora RDS Database can be setup as a server-based database service or server-less.

Challenges of managing a server-less Aurora RDS Database service

  • There is no longer access to an "OS command prompt nor a SQL prompt" that allows to monitor and inspect a server along with the performance of a database in the traditional way. This is perhaps that most difficult cultural change to adapt to.
  • The administration console is very simplistic and basic, which may initially be a challenge for experienced system and database administrators.
  • Exporting a database to a format that can be moved or imported to another set of PostgreSQL binaries can be challenging. A virtual machine with PostgreSQL binaries must be setup on AWS and export the data via remote connection. Both, the VM and the Aurora RDS database must belong to the same "Data" security group within the AWS nameplate.
  • An alternative to move the data from an Aurora RDS into a PostgreSQL database is to use the Database Migration Service (Amazon DMS) but if the target database is a regular PostgreSQL in RDS (not Aurora RDS) the snapshot taken can only be used by RDS.
  • Using the Query Editor within the RDS console will save the credentials into the Secrets Manager service. Careful consideration should be taken when assigning access to the AWS nameplate.
  • Due to security policies, Terraform is unable to create additional accounts, as it requires a SSL connection to the database instance. A workaround could be to run Terraform Cloud, but that function was not part of this evaluation as it was decided to handle all database migrations (including the creation of user accounts) via Flyway.

Advantages of using a server-less Aurora RDS Database service

  • No need to patch or upgrade the database binaries (ever!)
  • Under the hood, it is still a standard implementation of PostgreSQL
  • Additional libraries can be installed (using the Amazon CDK)
  • The database can be exported (with some effort) to a different PostgreSQL database
  • There is minimum database administration overhead
  • Allow for the provisioning of databases in minutes, via Terraform
  • Automatic backups and retention, that can be customized

References

Amazon RDS Postgres or Amazon Aurora Postgres ?.

Clone this wiki locally