Creating a Postgres Foreign Data Wrapper

78
Creating a Postgres Foreign Data Wrapper

Here at DoltHub some of us have been working with PostgreSQL extensions
recently. This is an introductory tutorial on how to get started building a
PostgreSQL foreign data wrapper. We introduce the basics around setting up a
project for building and installing the extension and how to implement a very
basic read only scan.

PostgreSQL is a powerful open-source SQL engine
with a ton of compelling features and great stability. It’s also very
extensible. It’s extension architecture allows shared libraries developed in C
to add new data types and functions, index and table storage formats, and query
planning and execution functionality.

One common type of postgres extension is a foreign data wrapper. Originally
designed for exposing tables and relations in a foreign SQL server so that they
could be queried and processed on a different postgres server, the architecture
allows for developing extensions to expose many foreign data sources as tables
to the postgres backend. They can then be queried and processed using the full
power of the postgres query engine, and can be easily combined with local data
or other remote data using powerful query capabilities like joins and
aggregations.

Postgres itself ships with two foreign data wrappers:

  • postgres_fdw — which will connect to a remote PostgreSQL server and expose tables from it.

  • file_fdw — which will allow for accessing local files and programs as if they were tables, exposing functionality similar to the COPY statement, but without importing the data into postgres tables first.

There are a multitude of third-party foreign data wrappers for connecting to
other SQL databases, like MySQL,
Oracle and
MSSQL, and also for connecting to other
data sources entirely, such as
Redis and
DynamoDB.

The postgres documentation is fantastic, but the surface area of the of the API
is decently large. This is a guided tour for getting started with a new foreign
data wrapper implementation.

To get started, you will need the following installed:

  • GNU Make
  • GCC or Clang
  • Postgres

The instructions here have been tested on macOS and Linux. On Windows, they
will definitely require Cygwin or MinGW, and may require further changes as
well.

We’re going to create a foreign data wrapper named tutorial_fdw. To start
with, we will create a directory for our project and a Makefile for building
it.

$ mkdir tutorial_fdw
$ cd tutorial_fdw

And we create the following Makefile:

MODULE_big = tutorial_fdw
OBJS = tutorial_fdw.o

EXTENSION = tutorial_fdw
DATA = tutorial_fdw--1.0.sql

PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)

This Makefile uses a build system that ships with Postgres called
PGXS. To use it, we
set some variables describing what artifacts we need to build and install, and
we include build system makefile which we find using pg_config.

If we now run make, it will complain that there is no rule to make
tutorial_fdw.o. We can

to make a bit of progress. Eventually that file will contain our implementation.

Running make again complains about a missing tutorial_fdw.control file.
The control
file

describes a proper Postgres extension, and includes metadata about the
extension itself.

$ cat > tutorial_fdw.control <<EOF
comment = 'Tutorial FDW.'
default_version = '1.0'
module_pathname = '$libdir/tutorial_fdw'
relocatable = true

Now our extension will build sucessfully. make install will fail with:

install: .//tutorial_fdw--1.0.sql: No such file or directory

Our Makefile references this file as a DATA component, and PostgreSQL’s
extension mechanism will attempt to load it when someone runs CREATE EXTENSION
tutorial_fdw;
. Its purpose is to run the lower-level SQL commands that will
load our shared library and point PostgreSQL at its entry points. A foreign
data wrapper in postgres has one mandatory and one optional entry point:

  • A handler entry point, which returns a struct of function pointers that will implement the foreign data wrapper API. These function pointers will be called by postgres to participate in query planning and execution. They will do things like estimate the number of rows that match a given restrict clause, enumerate and estimate the costs of the different ways of accessing a given set of rows (through an index, table scan, covering index scan, etc.), and implement the table access itself where each row is returned to the execution engine. This is the meat of a FDW implementation and what we are going to build.

  • A validator entry point, which is an optional entry point that will be called with the options which have been set on the foreign server, table, user or wrapper itself.

We will leave the validator as a future addition and just implement handler
for now.

$ cat > tutorial_fdw--1.0.sql <<EOF
CREATE FUNCTION tutorial_fdw_handler()
RETURNS fdw_handler
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

CREATE FOREIGN DATA WRAPPER tutorial_fdw
  HANDLER tutorial_fdw_handler;
EOF

Now our extension can build and install as a proper extension accessible from
our postgres installation. We will now be iterating on an implementation to
actually return result tuples as part of a table scan. Let’s define exactly
what we want our tutorial foreign data wrapper to do, and how people will
interface with it.

Our FDW will be quite silly. We will implement it to support single column
tables where the column is an integer. By default, we will return single
element tuples sequentially from 0 to 63 inclusive. Then we will add the
ability for options on the table to control where we start and where we end.
Interacting with our extension should look like:

> CREATE EXTENSION tutorial_fdw;
CREATE EXTENSION
> CREATE SERVER tutorial_server FOREIGN DATA WRAPPER tutorial_fdw;
CREATE SERVER
> CREATE FOREIGN TABLE sequential_ints ( val int ) SERVER tutorial_server;
CREATE FOREGIN TABLE
> SELECT * FROM sequential_ints;
 val

0
1
2
3
4
5
...
(64 rows)

As we iterate on our implementation, we will want to test it out. So we create
a small smoke test which does the above in a standalone (and newly created)
postgres database. smoke_test.sh looks like:

#!/bin/bash

set -eo pipefail

make install

PGDATA=`mktemp -d -t tfdw-XXXXXXXXXXX`

trap "PGDATA="$PGDATA" pg_ctl stop >/dev/null || true; rm -rf "$PGDATA"" EXIT

PGDATA="$PGDATA" pg_ctl initdb > /dev/null
PGDATA="$PGDATA" pg_ctl start
psql postgres -f smoke_test.sql

And smoke_test.sql is:

CREATE EXTENSION tutorial_fdw;
CREATE SERVER tutorial_server FOREIGN DATA WRAPPER tutorial_fdw;
CREATE FOREIGN TABLE sequential_ints ( val int ) SERVER tutorial_server;
SELECT * FROM sequential_ints;

Now after ./smoke_test.sh, we will see the server logs, which will include error log lines similar to:

ERROR:  incompatible library "/usr/local/lib/postgresql/tutorial_fdw.so": missing magic block
HINT:  Extension libraries are required to use the PG_MODULE_MAGIC macro.

That means our extension is installed and postgres attempted to load it. Great
progress. Now we just need to write the implementation.

Let’s continue on our path of addressing the immediate failures we see.
PG_MODULE_MAGIC is documented
here
and we
will need to include it in our shared library. From now on, we will be making
changes to tutorial_fdw.c, which is currently empty.

--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -0,0 +1,3 @@
+#include "postgres.h"
+#include "fmgr.h"
+PG_MODULE_MAGIC;

Changes our first error upon running smoke_test.sh to:

psql:smoke_test.sql:1: ERROR:  could not find function "tutorial_fdw_handler" in file "/usr/local/lib/postgresql/tutorial_fdw.so"

So we will need to implement the function we told postgres would exist.

--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -1,3 +1,14 @@
 #include "postgres.h"
 #include "fmgr.h"
+#include "foreign/fdwapi.h"
+Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
+PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
+
+Datum
+tutorial_fdw_handler(PG_FUNCTION_ARGS)
+{
+        FdwRoutine *fdwroutine = makeNode(FdwRoutine);
+        PG_RETURN_POINTER(fdwroutine);
+}
+
 PG_MODULE_MAGIC;

This is an extension function that is callable from SQL, so it has a particular
calling convention it needs to follow. We use preprocessor macros to declare
its arguments, generate its return value and expose it in our shared library.
You can read about these calling
conventions
and
how to work with them further. For the purposes of a foreign data wrapper, we
have a parameter-less function that always returns a pointer to a struct
FdwRoutine
.

When we run our smoke_test.sh now, you will see that our client gets a closed
connection error and the postgres server logs that the child process died with
a segmentation fault. Our implementation returned a pointer to a newly
allocated and zeroed FdwRoutine struct, but that struct is meant to contain
function pointers to our implementation. Our handler function did not populate
any. A small number of the function pointers are necessary, but most are
optional. You can find documentation on the
members
and see the
definition of the
struct
.

For now, let’s populate the necessary members with some stubs.

--- a/tutorial_fdw.c
+++ b/tutorial_fdw.c
@@ -4,11 +4,51 @@
 Datum tutorial_fdw_handler(PG_FUNCTION_ARGS);
 PG_FUNCTION_INFO_V1(tutorial_fdw_handler);
 
+void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid);
+void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid);
+ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid,
+    ForeignPath *best_path, List *tlist, List *scan_clauses, Plan *outer_plan);
+void tutorial_fdw_BeginForeignScan(ForeignScanState *node, int eflags);
+TupleTableSlot *tutorial_fdw_IterateForeignScan(ForeignScanState *node);
+void tutorial_fdw_ReScanForeignScan(ForeignScanState *node);
+void tutorial_fdw_EndForeignScan(ForeignScanState *node);
+
 Datum
 tutorial_fdw_handler(PG_FUNCTION_ARGS)
 {
         FdwRoutine *fdwroutine = makeNode(FdwRoutine);
+        fdwroutine->GetForeignRelSize = tutorial_fdw_GetForeignRelSize;
+        fdwroutine->GetForeignPaths = tutorial_fdw_GetForeignPaths;
+        fdwroutine->GetForeignPlan = tutorial_fdw_GetForeignPlan;
+        fdwroutine->BeginForeignScan = tutorial_fdw_BeginForeignScan;
+        fdwroutine->IterateForeignScan = tutorial_fdw_IterateForeignScan;
+        fdwroutine->ReScanForeignScan = tutorial_fdw_ReScanForeignScan;
+        fdwroutine->EndForeignScan = tutorial_fdw_EndForeignScan;
         PG_RETURN_POINTER(fdwroutine);
 }
 
+void tutorial_fdw_GetForeignRelSize(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
+}
+
+void tutorial_fdw_GetForeignPaths(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid) {
+}
+
+ForeignScan *tutorial_fdw_GetForeignPlan(PlannerInfo *root, RelOptInfo *baserel, Oid foreigntableid,
+    ForeignPath *best_path, List *tlist, List *scan_clauses, Plan *outer_plan) {
+        return NULL;
+}
+
+void tutorial_fdw_BeginForeignScan(ForeignScanState *node, int eflags) {
+}
+
+TupleTableSlot *tutorial_fdw_IterateForeignScan(ForeignScanState *node) {
+        return NULL;
+}
+
+void tutorial_fdw_ReScanForeignScan(ForeignScanState *node) {
+}
+
+void tutorial_fdw_EndForeignScan(ForeignScanState *node) {
+}
+

Our stubs all do absolutely nothing and, when they have to return a pointer, return NULL. Now if we run smoke_test, we get:

ERROR:  could not devise a query plan for the given query

It’s time to flesh out the implementation and actually participate in query
planning.

Postgres documentation describes the path of a
query
. After a query has
been parsed and rewritten, our FDW will participate in planning and optimizing
to come up with a final query plan that can be executed. The postgres query
planner is going to call GetForeignRelSize, GetForeignPaths and
GetForeignPlan, in that order, for each table access in a query. The purpose
is as follows:

  • GetForeignRelSize — update baserel->rows, and possibly baserel->width and baserel->tuples, with an estimated result set size for a scan of baserel, accounting for the filtering done by restriction qualifiers. Planning can proceed without an accurate implementation here, but the potential for misoptimizations is high.

  • GetForeignPaths — update baserel->pathlist to include ways of accessing baserel. Typically adds ForeignPath *s created with create_foreignscan_path to baserel using add_path. Each added path will include a co

NOW WITH OVER +8500 USERS. people can Join Knowasiak for free. Sign up on Knowasiak.com
Read More

Vanic
WRITTEN BY

Vanic

“Simplicity, patience, compassion.
These three are your greatest treasures.
Simple in actions and thoughts, you return to the source of being.
Patient with both friends and enemies,
you accord with the way things are.
Compassionate toward yourself,
you reconcile all beings in the world.”
― Lao Tzu, Tao Te Ching