Here at DoltHub some of us have been working with PostgreSQL extensions
recently. This is an introductory tutorial on how to get started building a
PostgreSQL foreign data wrapper. We introduce the basics around setting up a
project for building and installing the extension and how to implement a very
basic read only scan.
PostgreSQL is a powerful open-source SQL engine
with a ton of compelling features and great stability. It’s also very
extensible. It’s extension architecture allows shared libraries developed in C
to add new data types and functions, index and table storage formats, and query
planning and execution functionality.
Featured Content Ads
add advertising hereOne common type of postgres extension is a foreign data wrapper. Originally
designed for exposing tables and relations in a foreign SQL server so that they
could be queried and processed on a different postgres server, the architecture
allows for developing extensions to expose many foreign data sources as tables
to the postgres backend. They can then be queried and processed using the full
power of the postgres query engine, and can be easily combined with local data
or other remote data using powerful query capabilities like joins and
aggregations.
Postgres itself ships with two foreign data wrappers:
-
postgres_fdw
— which will connect to a remote PostgreSQL server and expose tables from it. -
file_fdw
— which will allow for accessing local files and programs as if they were tables, exposing functionality similar to the COPY statement, but without importing the data into postgres tables first.
There are a multitude of third-party foreign data wrappers for connecting to
other SQL databases, like MySQL,
Oracle and
MSSQL, and also for connecting to other
data sources entirely, such as
Redis and
DynamoDB.
The postgres documentation is fantastic, but the surface area of the of the API
is decently large. This is a guided tour for getting started with a new foreign
data wrapper implementation.
To get started, you will need the following installed:
- GNU Make
- GCC or Clang
- Postgres
The instructions here have been tested on macOS and Linux. On Windows, they
will definitely require Cygwin or MinGW, and may require further changes as
well.
Featured Content Ads
add advertising hereWe’re going to create a foreign data wrapper named tutorial_fdw
. To start
with, we will create a directory for our project and a Makefile
for building
it.
$ mkdir tutorial_fdw
$ cd tutorial_fdw
And we create the following Makefile
:
MODULE_big = tutorial_fdw
OBJS = tutorial_fdw.o
EXTENSION = tutorial_fdw
DATA = tutorial_fdw--1.0.sql
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
This Makefile
uses a build system that ships with Postgres called
PGXS. To use it, we
set some variables describing what artifacts we need to build and install, and
we include build system makefile which we find using pg_config
.
If we now run make
, it will complain that there is no rule to make
tutorial_fdw.o
. We can
to make a bit of progress. Eventually that file will contain our implementation.
Running make
again complains about a missing tutorial_fdw.control
file.
The control
file
describes a proper Postgres extension, and includes metadata about the
extension itself.
$ cat > tutorial_fdw.control <<EOF
comment = 'Tutorial FDW.'
default_version = '1.0'
module_pathname = '$libdir/tutorial_fdw'
relocatable = true
Now our extension will build sucessfully. make install
will fail with:
install: .//tutorial_fdw--1.0.sql: No such file or directory
Our Makefile
references this file as a DATA
component, and PostgreSQL’s
extension mechanism will attempt to load it when someone runs CREATE EXTENSION
. Its purpose is to run the lower-level SQL commands that will
tutorial_fdw;
load our shared library and point PostgreSQL at its entry points. A foreign
data wrapper in postgres has one mandatory and one optional entry point:
-
A
handler
entry point, which returns a struct of function pointers that will implement the foreign data wrapper API. These function pointers will be called by postgres to participate in query planning and execution. They will do things like estimate the number of rows that match a given restrict clause, enumerate and estimate the costs of the different ways of accessing a given set of rows (through an index, table scan, covering index scan, etc.), and implement the table access itself where each row is returned to the execution engine. This is the meat of a FDW implementation and what we are going to build. -
A
validator
entry point, which is an optional entry point that will be called with the options which have been set on the foreign server, table, user or wrapper itself.
We will leave the validator as a future addition and just implement handler
for now.
$ cat > tutorial_fdw--1.0.sql <<EOF
CREATE FUNCTION tutorial_fdw_handler()
RETURNS fdw_handler
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;
CREATE FOREIGN DATA WRAPPER tutorial_fdw
HANDLER tutorial_fdw_handler;
EOF
Now our extension can build and install as a proper extension accessible from
our postgres installation. We will now be iterating on an implementation to
actually return result tuples as part of a table scan. Let’s define exactly
what we want our tutorial foreign data wrapper to do, and how people will
interface with it.
Our FDW will be quite silly. We will implement it to support single column
tables where the column is an integer. By default, we will return single
element tuples sequentially from 0 to 63 inclusive. Then we will add the
ability for options
on the table to control where we start and where we end.
Interacting with our extension should look like:
> CREATE EXTENSION tutorial_fdw;
CREATE EXTENSION
> CREATE SERVER tutorial_server FOREIGN DATA WRAPPER tutorial_fdw;
CREATE SERVER
> CREATE FOREIGN TABLE sequential_ints ( val int ) SERVER tutorial_server;
CREATE FOREGIN TABLE
> SELECT * FROM sequential_ints;
val
0
1
2
3
4
5
...
(64 rows)
As we iterate on our implementation, we will want to test it out. So we create
a small smoke test which does the above in a standalone (and newly created)
postgres database. smoke_test.sh
looks like:
#!/bin/bash
set -eo pipefail
make install
PGDATA=`mktemp -d -t tfdw-XXXXXXXXXXX`
trap "PGDATA="$PGDATA" pg_ctl stop >/dev/null || true; rm -rf "$PGDATA"" EXIT
PGDATA="$PGDATA" pg_ctl initdb > /dev/null
PGDATA="$PGDATA" pg_ctl start
psql postgres -f smoke_test.sql
And smoke_test.sql
is:
CREATE EXTENSION tutorial_fdw;
CREATE SERVER tutorial_server FOREIGN DATA WRAPPER tutorial_fdw;
CREATE FOREIGN TABLE sequential_ints ( val int ) SERVER tutorial_server;
SELECT * FROM sequential_ints;
Now after ./smoke_test.sh
, we will see the server logs, which will include error log lines similar to:
ERROR: incompatible library "/usr/local/lib/postgresql/tutorial_fdw.so": missing magic block
HINT: Extension libraries are required to use the PG_MODULE_MAGIC macro.
That means our extension is installed and postgres attempted to load it. Great
progress. Now we just need to write the implementation.
Let’s continue on our path of addressing the immediate failures we see.
PG_MODULE_MAGIC
is documented
here and we
will need to include it in our shared library. From now on, we will be making
changes to tutorial_fdw.c
, which is currently empty.
--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -0,0 +1,3 @@
+#include "postgres.h"
+#include "fmgr.h"
+PG_MODULE_MAGIC;
Changes our first error upon running smoke_test.sh
to:
psql:smoke_test.sql:1: ERROR: could not find function "tutorial_fdw_handler" in file "/usr/local/lib/postgresql/tutorial_fdw.so"
So we will need to implement the function we told postgres would exist.
--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -1,3 +1,14 @@
#include "postgres.h"
#include "fmgr.h"
+#include "foreign/fdwapi.h"
+Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
+
+Datum
+tutorial_fdw_handler(PG_FUNCTION_ARGS)
+{
+ FdwRoutine *fdwroutine = makeNode(FdwRoutine);
+ PG_RETURN_POINTER(fdwroutine);
+}
+
PG_MODULE_MAGIC;
This is an extension function that is callable from SQL, so it has a particular
calling convention it needs to follow. We use preprocessor macros to declare
its arguments, generate its return value and expose it in our shared library.
You can read about these calling
conventions and
how to work with them further. For the purposes of a foreign data wrapper, we
have a parameter-less function that always returns a pointer to a struct
.
FdwRoutine
When we run our smoke_test.sh
now, you will see that our client gets a closed
connection error and the postgres server logs that the child process died with
a segmentation fault. Our implementation returned a pointer to a newly
allocated and zeroed FdwRoutine struct, but that struct is meant to contain
function pointers to our implementation. Our handler function did not populate
any. A small number of the function pointers are necessary, but most are
optional. You can find documentation on the
members and see the
definition of the
struct.
For now, let’s populate the necessary members with some stubs.
--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -4,11 +4,51 @@
Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
+void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid);
+void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid);
+ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid,
+ ForeignPath *best_path, List *tlist, List *scan_clauses, Plan *outer_plan);
+void tutorial_fdw_BeginForeignScan(ForeignScanState *node, int eflags);
+TupleTableSlot *tutorial_fdw_IterateForeignScan(ForeignScanState *node);
+void tutorial_fdw_ReScanForeignScan(ForeignScanState *node);
+void tutorial_fdw_EndForeignScan(ForeignScanState *node);
+
Datum
tutorial_fdw_handler(PG_FUNCTION_ARGS)
{
FdwRoutine *fdwroutine = makeNode(FdwRoutine);
+ fdwroutine->GetForeignRelSize = tutorial_fdw_GetForeignRelSize;
+ fdwroutine->GetForeignPaths = tutorial_fdw_GetForeignPaths;
+ fdwroutine->GetForeignPlan = tutorial_fdw_GetForeignPlan;
+ fdwroutine->BeginForeignScan = tutorial_fdw_BeginForeignScan;
+ fdwroutine->IterateForeignScan = tutorial_fdw_IterateForeignScan;
+ fdwroutine->ReScanForeignScan = tutorial_fdw_ReScanForeignScan;
+ fdwroutine->EndForeignScan = tutorial_fdw_EndForeignScan;
PG_RETURN_POINTER(fdwroutine);
}
+void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
+}
+
+void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
+}
+
+ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid,
+ ForeignPath *best_path, List *tlist, List *scan_clauses, Plan *outer_plan) {
+ return NULL;
+}
+
+void tutorial_fdw_BeginForeignScan(ForeignScanState *node, int eflags) {
+}
+
+TupleTableSlot *tutorial_fdw_IterateForeignScan(ForeignScanState *node) {
+ return NULL;
+}
+
+void tutorial_fdw_ReScanForeignScan(ForeignScanState *node) {
+}
+
+void tutorial_fdw_EndForeignScan(ForeignScanState *node) {
+}
+
Our stubs all do absolutely nothing and, when they have to return a pointer, return NULL
. Now if we run smoke_test, we get:
ERROR: could not devise a query plan for the given query
It’s time to flesh out the implementation and actually participate in query
planning.
Postgres documentation describes the path of a
query. After a query has
been parsed and rewritten, our FDW will participate in planning and optimizing
to come up with a final query plan that can be executed. The postgres query
planner is going to call GetForeignRelSize
, GetForeignPaths
and
GetForeignPlan
, in that order, for each table access in a query. The purpose
is as follows:
-
GetForeignRelSize
— updatebaserel->rows
, and possiblybaserel->width
andbaserel->tuples
, with an estimated result set size for a scan ofbaserel
, accounting for the filtering done by restriction qualifiers. Planning can proceed without an accurate implementation here, but the potential for misoptimizations is high. -
GetForeignPaths
— updatebaserel->pathlist
to include ways of accessingbaserel
. Typically addsForeignPath *
s created withcreate_foreignscan_path
tobaserel
usingadd_path
. Each added path will include a co