OCR and Indexing

This article discusses Optical Character Recognition (OCR) and document indexing in Filevine. OCR visually recognizes the characters in documents like PDFs and JPGs, making the text in those documents searchable. Filevine’s document indexing also allows you to search the text of document types like DOCX and RTF.

There are no page, word, or character limits to Filevine’s OCR and indexing.

File Properties Section
Searching
OCR Overview (Video)

File Properties Section

In the project Docs section, OCR and Indexing information appears in the document’s properties flyout, at the bottom of the details tab. If you are uploading new documents, this section may take a moment to appear.

File Properties

Once the section appears, you will see “Indexing” followed by a status for the document, like “Queued for OCR.” To update the status, refresh your screen. The document indexing may take a moment to complete.

OCR

If the document is an image file, like PDF or JPG, the document will be OCRed. When the OCR process is completed, the Files Properties section in the property flyout panel will provide the number of pages and the type of document.

Indexed: 8 PDF Pages. OCR: Completed.

Indexing

If the document is a text file, like DOCX or RTF, the document will be indexed. When the indexing is completed, the Files Properties section in the property flyout panel will provide the number of pages and the type of document. It will also provide additional metadata for the document, dependent on the type of file and what metadata is available from that document.

Author: Patricia Runbacker. Last Saved By: Jeremy Hows. Application: MS Word. Indexed: 2 DOCX Pages.

File Types

OCR File Formats
.pdf
.png
.jpg
.pjp
.pjpeg
.jpeg
.bmp
.jfif

Document Indexing File Formats
.csv
.docx
.eml
.html
.msg
.ods
.pptx
.rtf
.txt
.xlsx

Searching

OCRed and indexed documents can be searched using the intra-project search or the global search. The text in these documents are searchable using advanced search queries.

If you are having trouble finding the search results you’re looking for, try using a backtick (`) before your search term to change the weighting of the search results, or putting quotations around your search term to get only the results with that exact phrase. Learn more about using advanced search queries.

Search results for OCR and indexed documents will contain all normal search information, with additional information about the number and placement of the searched word(s) in the document.

The top of the search card, after the document title, will list the page(s) on which the search term has been found. If the search term has been found on more than five pages in the document, then the card will instead list the number of pages on which the search term has been found.

Image_10-18-19_at_11.31_AM.jpg
Image_10-18-19_at_11.56_AM.jpg

Directly under the title, the search card will show the search term within the document, with the surrounding words for context. If the search term appears multiple times, each instance of the search terms will be separated by an interpunct, or center dot.

Image_10-18-19_at_11.37_AM.jpg

OCR Video

Watch the webinar below to get an overview of what OCR is, how it appears in Filevine, and how to use it.

Articles in this section

File Properties Section

OCR

Indexing

File Types

Searching

OCR Video

Comments

Articles in this section

File Properties Section

OCR

Indexing

File Types

Searching

OCR Video

Related articles