Precise code intelligence

Precise code intelligence relies on LSIF (Language Server Index Format) data to deliver precomputed code intelligence. It provides fast and highly accurate code intelligence but needs to be periodically generated and uploaded to your Sourcegraph instance. Precise code intelligence is an opt-in feature: repositories for which you have not uploaded LSIF data will continue to use the search-based code intelligence.

Getting started

See the how-to guides to get started with precise code intelligence.

Cross-repository code intelligence

Cross-repository code intelligence works out-of-the-box when both the dependent repository and the dependency repository has LSIF data at the correct commits or versions. We are working on relaxing this constraint so that nearest-commit functionality works on a cross-repository basis as well.

When the current repository has LSIF data and a dependent doesn’t, the missing precise results will be supplemented with imprecise search-based code intelligence. This also applies when both repositories have LSIF data, but for a different set of versions. For example, if repository A@v1 depends on B@v2, then we will get precise cross-repository intelligence when we have LSIF data for both A@v1 and B@v2, but would not get a precise result we instead have LISF data for A@v1 and B@v1.

Why are my results sometimes incorrect?

If LSIF data is not found for a particular file in a repository, Sourcegraph will fall back to search-based code intelligence. You may occasionally see results from search-based code intelligence even when you have uploaded LSIF data. Such results are indicated with a tooltip tooltip. This can happen in the following scenarios:

The symbol has LSIF data, but it is defined in a repository which does not have LSIF data.
The nearest commit that has LSIF data is too far away from your browsing commit. The limit is 100 commits ahead/behind.
The line containing the symbol was created or edited between the nearest indexed commit and the commit being browsed.
The Find references panel will always include search-based results, but only after all of the precise results have been displayed. This ensures every symbol has code intelligence.

Size of upload data

The following table gives a rough estimate for the space and time requirements for indexing and conversion. These repositories are a representative sample of public Go repositories available on GitHub. The working tree size is the size of the clone at the given commit (without git history), the number of files indexed, and the number of lines of Go code in the repository. The index size gives the size of the uncompressed LSIF output of the indexer. The conversion size gives the total amount of disk space occupied after uploading the dump to a Sourcegraph instance.

Repository	Working tree size	Index time	Index size	Processing time	Post-processing size
bigcache	216KB, 32 files, 2.585k loc	1.18s	3.5MB	0.45s	0.6MB
sqlc	396KB, 24 files, 7.041k loc	1.53s	7.2MB	1.62s	1.6MB
nebula	700KB, 71 files, 10.704k loc	2.48s	16MB	1.63s	2.9MB
cayley	5.6MB, 226 files, 36.346k loc	5.58s	51MB	4.68s	11MB
go-ethereum	27MB, 945 files, 317.664k loc	20.53s	255MB	77.40s	50MB
kubernetes	301MB, 4577 files, 1.550m loc	1.21m	910MB	80.06s	162MB
aws-sdk-go	119MB, 1759 files, 1.067m loc	8.20m	1.3GB	155.82s	358MB

Data retention policy

The bulk of LSIF data is stored on-disk, and as code intelligence data for a commit ages it becomes less useful. Sourcegraph will automatically remove the least recently uploaded data if the amount of used disk space exceeds a configurable threshold. This value defaults to 10 GiB (10⨉2^30 = 10737418240 bytes), and can be changed via the DBS_DIR_MAXIMUM_SIZE_BYTES environment variable.

More about LSIF

To learn more, check out our lightning talk about LSIF from GopherCon 2019 or the introductory blog post: