These are the words of a madman, not necessarily true nor possible.

1. Useful SPARQL concepts

1.0.1. Endpoint

individual service able to reply to SPARQL queries (eg. tracker-store, or https://query.wikidata.org/)

1.0.2. GRAPH

Individual collections of RDF triples

https://www.w3.org/TR/sparql11-query/#rdfDataset

1.0.3. DESCRIBE/CONSTRUCT

Query syntax to generate RDF data out of a dataset

https://www.w3.org/TR/sparql11-query/#describe

https://www.w3.org/TR/sparql11-query/#construct

1.0.4. LOAD

Update syntax to incorporate external resources (eg. RDF) into a graph in the dataset

https://www.w3.org/TR/sparql11-update/#load

1.0.5. SERVICE

Syntax to distribute queries across SPARQL endpoints and merge the results

https://www.w3.org/TR/2013/REC-sparql11-federated-query-20130321/#introduction

2. Concepts to explore

2.0.1. DESCRIBE/CONSTRUCT

DESCRIBE/CONSTRUCT at large scale are reasonably easy now that tracker supports unrestricted queries

2.0.2. GRAPH

Tracker has very rudimentary support for graphs:

No two graphs may have the same triple (cardinality is global)
Unique indices are global too
FROM/FROM NAMED/GRAPH syntax aren't entirely right

At the heart of all this is the approach to store graph data in the database, every property has an additional *:graph column, but data from all graphs is actually merged in the same tables under the same restrictions.

Graphs may be generally considered isolated units, a more 1:1 approach would consist of having graphs be stored in individual databases, that may be later merged together by the engine (eg. through https://www.sqlite.org/unionvtab.html). The CLEAR/CREATE/DROP/COPY/MOVE/ADD additional graph management syntax from sparql1.1 might quickly fall in place with this.

2.0.2.1. Caveats/pitfalls

resource IDs do need to be global, tracker reset must consider.
sqlite limitations apply, point 11 in https://sqlite.org/limits.html is relevant to this approach.

2.0.3. LOAD

We have most of the pieces to implement LOAD, as we already have a tracker-store DBus method that pretty much does this. Basically it turns into a language feature then. However, it might benefit from graphs as described above.

2.0.4. SERVICE

SERVICE might be possible to implement through a virtual table (https://sqlite.org/vtab.html), Tracker roughly provides this functionality through tracker_sparql_connection_remote_new(), although that connects to an specific endpoint instead of blending it into the query.

2.0.4.1. Caveats/pitfalls

Virtual tables have a fixed set of columns set at construction, might require some JIT/dynamic management of tables in TEMP/MEMORY
Partially resolving the local query in order to produce the most optimized remote query (eg. provide values/ranges) seems hard. Just not doing that and letting sqlite handle it all through the virtual table sounds feasible, but slow.

3. Piecing it together

3.0.1. Backups

An application might be able to do:

  DESCRIBE ?u
  WHERE {
    ?u a nmm:Photo ;
       nfo:belongsToContainer/nie:url 'file:///run/media...'
  }

And serialize the results into a file, which might then be loaded through:

  LOAD SILENT <file:///...>

This essentially supersedes tracker_sparql_connection_load().

3.0.2. Sandboxing (Option 1)

Built upon graphs as individual databases. Those can be selectively exposed into the sandbox FS.

3.0.2.1. Pros

Allows direct readonly access within the sandbox
Single tracker-store, outside the sandbox
Minimal changes to sparql around

3.0.2.2. Cons

All updates still have to happen through DBus
Beware of limits on the number of attached databases

3.0.2.3. ???

Miners stay in the host
Data isolation comes from miners, eg. music and photos would get distinct graphs, and applications would request access to those.

3.0.3. Sandboxing (Option 1.5)

On top of the previous option, we could make a TrackerSparqlConnection that has a private writable store (like tracker_sparql_connection_local_new), but can get readonly access to the global store.

3.0.3.1. Pros

Allows direct readonly access within the sandbox
Updates happen to the local private store, within the sandbox. The host data cannot be changed.
Minimal changes to sparql around
tracker-extract might move within the sandbox

3.0.3.2. Cons

Every graph must still follow the same ontology
If host data is deleted (eg. tracker reset), the private database cannot be expected to be coherent.
Beware of limits on the number of attached databases

3.0.3.3. ???

Data isolation comes from miners, eg. music and photos would get distinct graphs, and applications would request access to those.

3.0.4. Sandboxing (Option 2)

Built upon SERVICE. tracker clients get a local store, queries across endpoints are done through SERVICE, eg:

  SELECT ?a ?url ?d {
    SERVICE <dbus://org.freedesktop.Tracker.Miner.FS> {
      ?u a nmm:Photo ;
         nie:url ?url
    } .
    ?a foo:url ?url ;
       foo:data ?d
  }

Optionally clients might export themselves over DBus as a sparql endpoint, able to be queried on the outside, eg an hypothetical global search might do:

  SELECT ?url {
    SERVICE <dbus://org.gnome.Music> {
      ?song nie:url ?url .
            fts:match "term"
    }UNION SERVICE <dbus://org.gnome.Photos> {
      ?photo nie:url ?url .
             fts:match "term"
    }
  }

Data becomes fully distributed (SPARQL's vision).

3.0.4.1. Pros

Full freedom wrt ontologies, the sandbox application might have a custom ontologies and data, meshed together with tracker miners' nepomuk
Updates are all kept within the sandbox, remote endpoints being readonly happen naturally from the sparql syntax.

3.0.4.2. Cons

Settles on DBus for IPC with any other endpoint. Direct access is not as straightforward.
Heavier sparql changes involved
Although graphs might still be used to split data, access control might be left up to the dbus layer
Needs some care to avoid breaking out into other endpoints from an authorized one, eg.

    SELECT * {
      SERVICE <dbus://org.freedesktop.Tracker.Miner.FS> {
        SERVICE <dbus://org.gnome.Photos> {
        }
      }
    }

3.0.4.3. ???

Although tracker-extract data might be within the sandbox, that would effectively lock the client on nepomuk ontology.