OCR and Indexing is part of Filevine’s Docs+ Features, which also include Docs+ PDF. If you are interested in Docs+, contact your Filevine rep to learn more.
This article discusses Optical Character Recognition (OCR) and document indexing in Filevine. OCR visually recognizes the characters in documents like PDFs and JPGs, making the text in those documents searchable. Filevine’s document indexing also allows you to search the text of document types like DOCX and RTF.
OCR and document indexing is a paid feature within Filevine. If you are interested in this feature, contact your Filevine representative to learn more and to have the feature enabled for your Org.
OCR and indexing can be applied either historically, where all applicable documents already in the Org are indexed and become searchable by text, or moving forward, where all applicable new documents in that Org are indexed.
If you are using Filevine’s legacy 2-way integration with Google Drive or Dropbox, the OCR and indexing feature may not currently be available.
File Properties Section
Once the OCR and document indexing feature is turned on, in your Doc sections you will see an additional section at the bottom of the properties flyout for any applicable file type. This new section is called “File Properties.” If you are uploading new documents, this section may take a moment to appear.
Once the section appears, you will see “Indexing” followed by a status for the document, like “Queued for OCR.” To update the status, refresh your screen. The document indexing may take a moment to complete.
If the document is an image file, like PDF or JPG, the document will be OCRed. When the OCR process is completed, the Files Properties section in the property flyout panel will provide the number of pages, the type of document, and the assigned confidence rating for the OCR.
The confidence rating will let you know how confident the OCR is that the character recognition is correct for this document. Some documents—for instance, a scanned .pdf file—are not recognizable when running an index. If the confidence rating is below 50%, the document will not be indexed and will not be available for OCR.
If the document is a text file, like DOCX or RTF, the document will be indexed. When the indexing is completed, the Files Properties section in the property flyout panel will provide the number of pages and the type of document. It will also provide additional metadata for the document, dependent on the type of file and what metadata is available from that document.
|OCR File Formats|
|Document Indexing File Formats|
OCRed and indexed documents can be searched using the intra-project search or the global search. The text in these documents are searchable using advanced search queries. For more information on how and where to search and how to use advanced search terms, read the Search Deeper article.
Search results for OCR and indexed documents will contain all normal search information, with additional information about the number and placement of the searched word(s) in the document.
The top of the search card, after the document title, will list the page(s) on which the search term has been found. If the search term has been found on more than five pages in the document, then the card will instead list the number of pages on which the search term has been found.
Directly under the title, the search card will list show the search term within the document, with the surrounding words for context. If the search term appears multiple times, each instance of the search terms will be separated by an interpunct, or center dot.