FEB
7
2010

Deploying SQLite

"There's nothing easier" -- you say -- about packaging and deploying SQLite. "Just take the software with default settings and package as a shared lib plus SQLite shell".

It's not that simple.

The SQLite project is developed at impressive speed assuming complexity of the software. It's already part of many operating systems like OS X and Symbian. Linuxes use it somewhat at system level. Browsers use SQLite for storage via HTML 5, earlier via Google Gears.

The fact is that all of the uses we can spot are for a specific cases. For each case slightly different configuration is beneficial. SQLite has two kinds of configurable options: runtime and compile-time. The latter includes configurable limits. SQLite deployment is typical to embedded software, which is efficient but requires developers to be aware the specifics.

Google Gears expected SQLite to have certain features enabled. This is also the case with Kexi or in general any app that uses SQLite for desktop databases. As you can read in the backlog of Kexi development for January, I have switched to system SQLite. I even proposed update to FindSqlite.cmake to make sure minimum version with enough features is in place. That lasted just one day.

Then I have immediately switched back. There were a few subtle and one main reason - security. SQLite provides one nice compile-time option:

SQLITE_SECURE_DELETE

    This compile-time option causes SQLite to overwrite deleted information
    with zeros in addition to marking the space as available for reuse.
    Without this option, deleted data might be recoverable from a database
    using a binary editor. However, there is a performance penalty for using this option.

I have assumed that we want this flag to be on, so one important workflow in Kexi is more secure. When you delete tables or even just table rows (records) from a database, and send the .kexi file (which is based on SQLite database) to others, you basically expect not to have the deleted information in the file. Unless SQLITE_SECURE_DELETE is enabled, this is not the case. Databases, includeing SQLite, like to just mark the deleted records are deleted without removing the empty space or cleaning up the bytes.

Enabling the feature at the cost efficiency is the current design decision. An alternative to SQLITE_SECURE_DELETE would be to vacuum the database on closing. But what if the application was terminated uncleanly (application or system crash)? And what if user sends the database by email while Kexi is still running? These question do not exist when SQLITE_SECURE_DELETE is on, and that's why I like the flag.

SQLITE_SECURE_DELETE is not always needed however, for example when you share the data through a web server or remote connections. But these use cases are not yet supported by a stable implementation in Kexi.

All in all, the current set of options for Kexi's copy of SQLite is as follows (kexidb/drivers/sqlite/CMakeLists.txt) -- look to just get an idea of possible future features of Kexi and Predicate library (KexiDB 2):

ADD_DEFINITIONS(
    # sqlite compile-time options, http://sqlite.org/compile.html
    -DSQLITE_SECURE_DELETE
    -DSQLITE_ENABLE_COLUMN_METADATA # Some additional APIs that provide convenient access to meta-data
                                    # about tables and queries
    -DSQLITE_ENABLE_FTS3 # Version 3 of the full-text search engine
    -DSQLITE_ENABLE_FTS3_PARENTHESIS # Modifies the query pattern parser in FTS3 such that it supports
                                     # operators AND and NOT (in addition to the usual OR and NEAR)
                                     # and also allows query expressions to contain nested parenthesesis.
    -DSQLITE_ENABLE_MEMORY_MANAGEMENT # Extra logic to SQLite that allows it to release unused memory upon request
    -DSQLITE_ENABLE_RTREE # Support for the R*Tree index extension
    -DSQLITE_ENABLE_STAT2 # Additional logic to the ANALYZE command and to the query planner that can help SQLite
                          # to chose a better query plan under certain situations
    -DSQLITE_ENABLE_UPDATE_DELETE_LIMIT # Optional ORDER BY and LIMIT clause on UPDATE and DELETE statements
    -DSQLITE_ENABLE_UNLOCK_NOTIFY # Enables the sqlite3_unlock_notify() interface and its associated functionality
                                  # (http://sqlite.org/unlock_notify.html)
    -DSQLITE_SOUNDEX # Enables the soundex() SQL function (http://sqlite.org/lang_corefunc.html#soundex)
)

As mentioned in the reviewboard comment even while SQLite is not packable for general use as a shared library, this is by design. Because there are many compile-time switches, so many not-fully compatible versions of SQLite can be found in particular distros.

By having own copy of regularly updated SQLite, whas has been started in 2004, we can also patch SQLite to add some esoteric features, e.g. provide progress information of the .dump operation, so we can have display the progress in the GUI, which is good for large files.

I am also thinking about related proposal: having a copy of SQLite moved from Kexi into some place like kdesupport, with sane build defaults. With the lib name altered to something like libksqlite to avoid clashes with distro-packaged SQLite. Then something like simple FindKSqlite, would be used within KDE.

Do you have any opinions on the matter? Please share it in the comments below.

Comments

I don't think the user should send/copy the file while using the database because it could be in an inconsistent state. That's why one of the ways to backup is to acquire a lock, copy the file and then release the lock (the other way is using the online backup API, more info here: http://www.sqlite.org/backup.html). I think the user should be informed how the deletion process takes place and that he has to close the database before sending/copying it.


By bugmenot at Mon, 02/08/2010 - 00:49

I would just like to note that Fedora is against it: http://fedoraproject.org/wiki/Packaging:No_Bundled_Libraries


By bugmenot at Mon, 02/08/2010 - 01:07

Not just against, it's outright banned and we will have to patch Kexi because of this.


By Kevin Kofler at Mon, 02/08/2010 - 10:10

Kexi bundles SQLite since 2004 and Fedora does not patch it. If it start to do it, then Fedora-distributed Kexi can break the .kexi files or just do not open them, and we'll have to make it clear for our users. So please work openly with authors of applications before doing any steps, because the fact that your packages build does nto mean it runs properly.

I proposed KDE-wide libksqlite as a solution.


By Jarosław Staniek at Mon, 02/08/2010 - 10:14

Enforcement of this policy has been lax in the past, but blocker bugs get filed against packages with bundled libraries when they get caught and reviewers are also supposed to check for bundled libraries when a new package gets submitted for review and reject the package if there are any. And by the way, right now Rawhide has no Kexi at all because we have KOffice 2.1.1 in there (and no KOffice 1) and Kexi is not part of that.


By Kevin Kofler at Mon, 02/08/2010 - 10:20

It's known and good policy. As I tried to demonstrate, it does not work with SQLite - there are many ways to build it. It could be even possible to have two builds of version x.y.z, one with feature FOO, and one without. Now imagine an accident when the feature FOO introduces a security hazard. Fedora cannot fix it if the feature is not enabled in the package delivered by Fedora.


By Jarosław Staniek at Mon, 02/08/2010 - 10:11

If feature FOO is not enabled in the Fedora SQLite package, no package in Fedora can use feature FOO, it's as simple as that.


By Kevin Kofler at Mon, 02/08/2010 - 10:12

I guess most of the current databases do not by default overwrite deleted data. The performance impact would simply be too big for some (most?) use cases. I would go for the "vacuum-on-close" option, even though it's not "secure" if the app crashes (but then you want to re-launch it again in any case, to see if any of your recent commits have been lost). But you can put the "vacuum()" call inside your crash handler so that it at least still works if your application crashes (woudln't help if the crash is in SQLite but then you're screwed anyway...)

just my 2ct...
-Darkstar


By dark-star at Mon, 02/08/2010 - 07:33

VACUUM takes O(N+M) time where N is the size of the old database, M is the size of the new database. Plus VACUUM re-reads all the structures, and recreates from scratch. Plus it needs M bytes on the storage for the temp file.

SQLITE_SECURE_DELETE is O(W) where W is the number of bytes written (we just write zeros into mem-mapped blocks, so that's quite efficient).

Summing up: VACUUM it's way too big cost for those users that open and close these databases often.

This does not mean VACUUM is evil: an option to run it from time to time (or propose to user to do so) is planned.

As I understand you would use VACUUM, I have one more TODO: having a configuration option. What needs even deeper patching of SQLite of course.


By Jarosław Staniek at Mon, 02/08/2010 - 10:25

Sorry, but this is just not going to fly for distributions like Fedora and Debian. We have strict guidelines banning bundled libraries in Fedora. So we'll have 2 options, either we patch Kexi to build with the system SQLite (even if functionality suffers as a result) or we don't ship Kexi at all. The third option would be asking our steering committee for an exception, but it's very unlikely to get approved. The prevailing opinion there is that applications need to be fixed to work with system libraries no matter what.


By Kevin Kofler at Mon, 02/08/2010 - 10:09