Registering custom VFS for SQLite3 Link to heading

SQLite3 uses virtual file systems in order to be portable across operating systems. There are multiple built-in VFSes, supporting OS operations on Unix-like systems, Windows systems etc.

It is possible to register a custom VFS, thus allowing us to run custom code at the time the SQLite driver does file operations (not unlike custom streams in GNU/Linux). In this blog post series, I’ll explore how this feature of SQLite can be used in different ways.

Registering a SQLite3 VFS in Python using apsw Link to heading

In this first blog post I’ll show how we can register the most basic, read-only VFS using the Python package apsw. To give it some kind of purpose, we’ll add functionality to count the number of reads to the underlying file system, along with the number of bytes read.

First we’ll need to subclass apsw.VFS:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import uuid
import apsw


class CountingVFS(apsw.VFS):
    def __init__(self, name: str = "counting_vfs", basevfs: str = ""):
        self.name = f'{name}_{str(uuid.uuid4())}'
        self.base_vfs = basevfs

        super().__init__(self.name, self.base_vfs)

        self.num_reads = 0
        self.num_read_bytes = 0

    def xOpen(self, name: apsw.URIFilename, flags: list[int]) -> 'CountingVFSFile':
        return CountingVFSFile(self, name, flags)

Next comes the definition of CountingVFSFile (we can optionally inherit from apsw.VFSFile), where we need to implement at least xRead and xFileSize:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import os


class CountingVFSFile(apsw.VFSFile):
    def __init__(self, vfs: CountingVFS, filename: apsw.URIFilename, flags: list[int]):
        super().__init__(vfs.base_vfs, filename, flags)

        self.vfs = vfs
        self.filename = filename.filename()
        self.file = open(self.filename, 'rb')

    def xRead(self, amount: int, offset: int) -> bytes:
        self.vfs.num_reads += 1
        self.vfs.num_read_bytes += amount

        self.file.seek(offset, 0)

        return self.file.read(amount)

    def xFileSize(self) -> int:
        return os.stat(self.filename).st_size

    def xClose(self):
        self.file.close()

In the constructor we simply open a file descriptor at the requested location1. To support reading from the file, we implement xRead. This function takes in the amount of bytes + the offset (from the beginning of the file) requested by the SQLite driver, and must return that amount of bytes. Since we’re simply reading from a local file, we can just read from the already opened file.

To implement the byte counting functionality, we just sum of the amount of bytes requested in the function call of xRead. Note that xWrite is not implemented, making this VFS purely read-only.

With the above implementations, we can go ahead and register the VFS with the SQLite driver, and use it when creating a new connection:

1
2
vfs = CountingVFS()
conn = apsw.Connection(path_to_db, flags=apsw.SQLITE_OPEN_READONLY, vfs=vfs.name)

In a more complete example, we can use the counting VFS in a command line interface, outputting the number of reads/number of bytes read while performing a query:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
from operator import itemgetter

import click
from humanize import naturalsize  # To get output bytes in a human-readable format
from tabulate import tabulate  # To get output rows in a neat table format


@click.command()
@click.argument('db_path', type=click.Path(exists=True, readable=True, resolve_path=True))
@click.argument('query', type=click.STRING)
def query_sqlite(db_path: str, query: str, ):
    vfs = CountingVFS()

    with apsw.Connection(db_path, flags=apsw.SQLITE_OPEN_READONLY, vfs=vfs.name) as conn:
        cursor = conn.execute(query)

        click.echo(tabulate(cursor, headers=map(itemgetter(0), cursor.description), tablefmt="github"))

    click.echo(f"Number of reads: {vfs.num_reads} ({naturalsize(vfs.num_read_bytes, gnu=True)} bytes)")


if __name__ == '__main__':
    query_sqlite()

I happen to have a database extract of the danish parliament data (extracted late ‘23), stored in the file ft.db (~ 200MB):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
python counting_vfs.py ft.db "SELECT navn, startdato FROM aktor WHERE typeid = 9 ORDER BY startdato DESC LIMIT 10;"
| navn                                         | startdato           |
|----------------------------------------------|---------------------|
| Helsefonden                                  | 2023-04-01T00:00:00 |
| Udvalget vedrørende Efterretningstjenesterne | 2023-01-05T00:00:00 |
| Det Udenrigspolitiske Nævn                   | 2022-11-01T00:00:00 |
| Sydslesvigudvalget                           | 2022-11-01T00:00:00 |
| Kontaktudvalget for Det Tyske Mindretal      | 2022-11-01T00:00:00 |
| OSCEs Parlamentariske Forsamling             | 2022-11-01T00:00:00 |
| Europarådet                                  | 2022-11-01T00:00:00 |
| NATO's Parlamentariske Forsamling            | 2022-11-01T00:00:00 |
| Dansk Interparlamentarisk Gruppes bestyrelse | 2022-11-01T00:00:00 |
| Udvalget vedrørende Det Etiske Råd           | 2022-11-01T00:00:00 |
Number of reads: 914 (3.6M bytes)

There it is. To extract these 10 rows, SQLite makes 914 reads to ft.db, reading a total of 3.6M bytes. This is quite a lot of bytes to trawl through, but makes sense given a largely un-indexed database.

Conclusion Link to heading

While the example given here is not particularly useful in itself, having the option to interject code at the time of OS operations turns out to be quite useful.

In part 2 of this series, I show how we can use another custom VFS to allow querying data stored in compressed files.


  1. A proper implementation should check the file exists, that we have sufficient permissions etc. ↩︎