Today we are proud to announce commercial availability of the 2.0 release of Analyse Documents (formerly ContraxSuite)! This is our thirty-first open-source release and one of the largest and most comprehensive to date. It represents a culmination of our longest-yet development cycle, and includes an Enhanced Annotator View that displays documents in their native format, many improvements to our UI, a brand-new text extraction system, more enhanced machine learning techniques, support for Conceptual Search, and much more. Version 2.0 also incorporates  APIs and support for integration with the other software products of the Elevate ELM.

The new Enhanced Annotator View, displaying documents in their native format, and the new Enhanced Notes feature

In future posts, we will explore in detail the Enhanced Annotator View, Enhanced Notes, Conceptual Search, the new Text Extraction System, and many more new and improved features. So, stay tuned over the coming weeks and months!

End-User Interface Release Notes

To help demonstrate the scope and depth of the updates included in Release 2.0, it is helpful to look at the following areas: :

Enhanced Annotator View: The platform can process documents, extract data from them, and then display those documents in the Annotator in their original format. Annotation highlights are now created and stored via pdf.js and a new text extraction system (TES). The TES is a standalone service working within Analyse Documents, with the following improvements:

  • PDF format is the main output, greatly improving our user-friendly interface
  • Plain-text output is included for natural language processing
  • Ability to match each character in plain text to the character’s PDF page coordinates
  • Capability to convertsany document format to PDF for universal processing
  • Automatic detection of pages that require optical character recognition (OCR)
  • Parallel OCR’ing of multiple pages at the same time, to speed up processing depending on the number of available workers in a cluster
  • Automatic page orientation detection and correction, and basic skew correction
  • Sentence-, paragraph-, and section-level language detectors using LexNLP and other open-source libraries
  • Table extraction from PDFs
  • Synchronous and asynchronous APIs
  • Retention of original text-based formatting of any documents uploaded prior to upgrading the ContraxSuite instance to 2.0

Enhanced Notes: Along with optimizing document viewing with the Enhanced Annotator View, we have also upgraded the Notes system in the Annotator.

  • Notes assigned to specific highlighted text units and/or annotations will now appear in the document with a corresponding yellow underline. Clicking the yellow dotted line will display the text of that note as a pop up, so that users can easily see a note’s text without having to use the right pane to view the note.

Enhanced Project Settings: Extended project-level settings by adding three new tabs to the “Settings” page:

  • Processing Options tab:
    • New “Run OCR on New Documents” checkbox. This box is checked by default. If the documents in a project have already been processed by OCR (Optical Character Recognition), you may uncheck this box.
    • Run a new “Detect Field Values” task on the whole project or specific documents in the project.
    • Configure document-level and/or text unit-level transformers to process document text and create similarity objects/Conceptual Search results.
  • Customize LexNLP tab:
    • Add additional Custom Term Sets and/or Company Type Sets to a project.
    • Run “Locate Terms” and/or “Locate Companies” tasks after changing which Term Sets and Company Type Sets to use for the project.
  • Processing Status tab:
    • See progress of project-level tasks like Load Documents, Detect Field Values, Locate Terms/Companies, etc., with configurable filters and the ability to export the Task Grid.

Conceptual Search: After running Document and/or Text Unit Transformers, users can see document similarity and text unit similarity in their projects in a Grid View, by doing one of the following:

  • Document Similarity: From the Project Grid, right-click a document title and select “Find Documents Like This” from the right-click menu.
  • Text Unit Similarity: Open a document in the Annotator, highlight a text unit with the cursor, right-click the text unit, and select “Find Text Like This” from the right-click menu.

Project Advanced Settings: This is a new panel on the project creation pop-up modal. Now when creating a new Batch or Contract Analysis project, users have access to additional options, many of which can be modified later on via the project’s “Settings” (see above):

  • LexNLP Locale: Choose the geographical locale for LexNLP to use. Currently, all major dialects of English and German are available. This includes differences between date and currency formats between countries.
  • Run OCR on New Documents: This box is checked by default. If the documents in a project have already been processed by OCR (Optical Character Recognition), you may uncheck this box.
  • Use Default Term Set: Leave this box checked to use the default legal terms included with LexNLP. Uncheck this box to add any additional Custom Term Sets that have been uploaded to any instance.
  • Use Default Company Types: Leave this box checked to use the default company types included with LexNLP. Uncheck this box to add any additional Custom Company Types that have been uploaded to any instance.

Document Field Advanced Settings: The new “Advanced” panel for Document Fields contains the following functionality:

  • Confidence: Indicates the user’s level of confidence in the success rate of the Field Detectors (or machine-learning, if the Field is ML-based).
  • Read Only: Checking this box makes this Field’s value read-only.
  • Default Value: For Choice, Multi Choice, or String Fields. Populate a Field with a specific value by default. This value will appear if the Field’s Detectors extracted nothing from a document after it was uploaded and parsed.
  • Hidden Always: This Field will not appear in the Field Values tab of the Annotator right pane.
  • Hide Until Python: Form for writing Python formulas for hiding a Field until Field contents meet formula criteria.

Version/Audit History: Implemented tracking user activity to create basic logs and audit history for many user actions:

  • Changing project Team Members
  • Project creation
  • Changing project names, adding/deleting documents
  • Changing document assignees and statuses
  • Changing Field Values, assignees, and statuses of documents
  • Adding/deleting/changing Document Types, Fields, and Field Detectors
  • Running clustering tasks
  • Project Settings: “Created by name/date/time” and “Modified by name/date/time” are now displayed on projects’ “Settings” pages, as well as edit pages for Document Types, Document Fields, and Field Detectors

Miscellaneous:

  • Checking the “Run OCR on New Documents” checkbox in the “Processing Options” tab of a project’s “Settings” now automatically performs the function of the previous “Make Searchable PDFs” task; checking this box will process all PDFs uploaded to a project and make them downloadable as machine-readable and searchable PDFs.
  • Added an in-app Zoom functionality to the Annotator, located in the lower right of the main pane.
  • Changed “Detect Limit Unit” and “Detect Limit Count” functionality to be Field Detector-level parameters, to provide users more flexibility in designing Document Fields.
  • We have ended support for Internet Explorer 11, in favor of better-supported browsers like Google Chrome, Microsoft Edge, and Mozilla Firefox.
  • Various bug fixes and improvements.

Detailed Changelog for End User Interface

New in Release 2.0

  • Batch Analysis – Grid View: Added the following options to Column Visibility:
    • Load Date
    • Load By
  • Contract Analysis – Grid View: Added the following options to Column Visibility:
    • Modified Date
    • Modified By
    • Load Date
    • Loaded By
  • Contract Analysis – Grid View: Column Visibility settings now allows users to choose to display the Value for a Field, and/or the annotation Text associated with that Field.
  • Implemented auto-deletion for Document and Text Unit Similarity results to avoid running out of space in an instance’s database. A scheduled task removes excessive similarity records from a database, based on certain application variables that system admins can adjust as necessary.
  • Implemented tracking user activity to create basic logs and audit history for many user actions:
    • Changing project Team Members
    • Project creation
    • Changing project names, adding/deleting documents
    • Changing document assignees and statuses
    • Changing Field Values, assignees, and statuses of documents
    • Adding/deleting/changing Document Types, Fields, and Field Detectors
    • Running clustering tasks
    • Project Settings: “Created by name/date/time” and “Modified by name/date/time” are now displayed on projects’ “Settings” pages, as well as edit pages for Document Types, Document Fields, and Field Detectors.
  • Added an in-app Zoom functionality to the Annotator, located in the lower right of the main pane. The following Zoom options are now available:
    • Automatic Zoom. Default
    • Actual Size
    • Page Fit
    • Page Width
    • Page Height
    • 50%
    • 75%
    • 100%
    • 125%
    • 150%
    • 200%
  • Annotator: Functionality for underlined defined terms (“Definitions”) has been updated; clicking a definition underline now displays a pop-up of the defined term. The same functionality has been added to the “Definitions” tab of the right pane.
  • Added an additional pre-loader to the Uppy.io document loader, so that large sets of documents are easier for users to upload.
  • Changed “Detect Limit Unit” and “Detect Limit Count” functionality to be Field Detector-level parameters, to provide users more flexibility in designing Document Fields.
  • Analyse Documents now uses our new text extraction system (TES) instead of Apache TIKA and Textract. Tesseract is still used by this new text extraction system for OCR tasks.
  • The Analyse Documents document editor now displays and works with formatted PDF documents instead of plain-text documents. Document annotations are converted from PDF rectangles to plain-text locations based on the character coordinates provided by the text extraction system.
  • Implemented extraction from text for different locales. Analyse Documents 2.0 supports both auto-detecting locales and selecting the locale at the system and/or project level.
  • We have ended support for Internet Explorer 11, in favor of better-supported browsers like Google Chrome, Microsoft Edge, and Mozilla Firefox.

Improved Features

  • Annotator:
    • Users can now right-click a breadcrumb to copy a document name.
    •  “Quick Data” tab Data Entities now have clearer highlights.
    • Section Navigation improved through the implementation of the new Text Extraction System (TES).
  • Contract Analysis – Annotator: Improved calendar and date selector. Users can now manually type a date into a Date Field or use a calendar-based selector to choose a year, month, and day.
  • Document Explorer – Text Unit Similarity: Clicking a text unit in the two-column Grid will display a pop-up with different text struck through, for easier comparison of text units.
  • Grid Views:
    • Improved functionality and wording of counters for when Sorting, and Filters, are/aren’t engaged.
    • Changed default order of Field columns for better readability.
    • Users can now engage Filters on Date Fields by a range of date values.
    • Users can now tailor a search of available columns in Column Visibility for faster selection.
  • Management – Document Fields:
    • Streamlined the section subheadings on the Field creation/edit pages, including the creation of the “Advanced” settings panel (see above).
    • The “Value regexp” form has been changed to say “Dependent Regexp”.
    • Re-positioned the “Exclude regexps” form underneath the “Include regexps” and “Definitions” forms.
    • Updated Value Detection Strategies, cleaning up and merging functionality.
  • Management – Document Fields Grid: Added “Formula” to “Column Visibility” options.
  • Management – Document Field Detectors:
    • Changed “Detect Limit Unit” and “Detect Limit Count” functionality to be Field Detector-level parameters, to provide users more flexibility.
    • The “Detected Value” form has been changed to say “Display Value”.
  • Management – Document Types:
    • Clicking the “X” button next to a Document Field in the Grid on a Document Type’s edit page will display the same warning message seen when users attempt to delete a Document Field on its own edit page.
    • Streamlined the Editor Type choices on the creation/edit pages.
    • Improved Document Type JSON migration schemas.
  • Project Settings: Project Owners cannot be removed from a project (“Basic Settings” tab) if they are the sole Owner of that project.
  • Improvements to Contract Type Classification, with new “Contract Type” Field now available in Project Grid Views.
  • Improved help text for App Vars in the Django Admin interface.
  • Various improvements to the Annotator UI.
  • Various improvements to “Warning” messages and UI workflows.

Bug Fixes

  • Annotator:
    • Annotations and values would sometimes not correspond to the highlights visible in the main pane.
    • Clicking through multiple annotations for the same text unit would sometimes show the the wrong tabs in blue (selected) or grey (not selected).
    • Assignees and statuses sometimes could not be changed in the “Status and Notes” tab.
    • Highlighted text would sometimes lose its highlight when a user scrolled.
    • Hot keys sometimes did not work.
  • Batch Analysis – Annotator: Clicking a document’s cluster on the right pane would sometimes redirect user to a blank Grid instead of the cluster-filtered Documents Grid for that cluster.
  • Batch Analysis – Clusters: Reassignment of document clusters would sometimes fail.
  • Contract Analysis – Annotator:
    • Annotations wouldn’t be focused when a user clicked on a Field.
    • Data was not being correctly displayed when users scrolled through numbered Related Info and Multi Choice annotations.
    • Moving between right-pane tabs would deselect highlighted text.
    • Moving between Search tab and other right pane tabs would lose the highlighted search term.
    • Page margins would sometimes adjust and make filling out forms on the Field Values tab difficult.
    •  “Set to” functionality was sometimes not working correctly, and users would sometimes get a “Warning” message when trying to input new/changed values.
    • It was possible to save a value without an annotation on Fields set to requires_annotations=true.
    • Date values sometimes could not be deleted from Date Fields that didn’t require annotations.
    • Users deleting an annotation value would see that value return after reloading the page, if the Field was set to requires_annotations=false.
    • Users sometimes could not assign annotations to Multi Choice Fields.
    • Using arrow keys in Field forms would incorrectly navigate to adjacent documents.
  • Contract Analysis – Clause Review:
    • Clauses could sometimes not be assigned or unassigned to users.
    •  “Next” and “Previous” buttons in the Clause Workflow would sometimes break if a user navigated too quickly.
    • Setting statuses on clauses would sometimes still lead to previous statuses being retained.
  • Grid Views:
    • Bulk assignment modal would sometimes display incorrect information on the “Review Team” section under the selection drop-down.
    •  “Export” function for Grid data was sometimes not functioning correctly.
    • Leaving Filters engaged, exiting a project, and then re-opening that project would lead to columns not being visible in the Grid.
    • Reloading a Grid page while Filters were engaged would sometimes cause data entered into Filters to jump to a different column’s Filter form.
    • Scroll bars sometimes appeared incorrectly on Grids, and too much white space would appear.
    • Using “Select All” with Filters engaged would lead to all documents/clauses being selected even if user disengaged Filters.
  • Management – Document Fields:
    • The “Depends on” Fields list for a formula-based Field now correctly shows only those Fields that are within the same Document Type.
    • The “Depends on” Fields two-column selector had incorrect labels on each form box.
    • Saving a new Document Field with the “Save” button at the top of the creation/edit page would not redirect the user back to the Grid as expected.
  • Project Grid Views
    • Document counters at the lower left would sometimes not update correctly.
    • Fixed a bug where documents with assignees wouldn’t display their “Review Team” when those documents were multi-selected for bulk action.
    • Fixed a bug where checking “Other Fields” in Column Visibility would sometimes not function correctly.
  • Stats Pages:
    • Screen would sometimes go blank when users opened the “Field and Detector Details” page, or the “Document Type Diagnostics” page after choosing a Document Type to display.
    • Some tables/charts were showing empty values, even when there was applicable data on the instance.
    • On “Document Type Diagnostics” page, Field data sometimes would not load when user clicked the arrow next to a Field.
  • Upload Modal:
    • Sometimes clicking “Add Contracts” after an upload would break the Uppy modal’s “Upload” button functionality.
    • Uppy window would sometimes freeze if a user uploaded duplicate documents.
  • Miscellaneous:
    • Annotations could be saved individually, even for a Document Type with its Editor Type set to “Save All Fields At Once”.
    • Navigating via breadcrumbs would sometimes lead to errors or a blank screen.
    • “My Open Tasks” grid on Home page would sometimes not appear.
    • “Recent Projects” blocks on Home page would sometimes not update correctly.
    • Password reset workflow was not redirecting users to the correct page(s).
    • Some screen resolutions would lead to the bottom row of the Grid being cut off.
    • Scroll bars would sometimes shift while document upload tasks were running.
    • Technical Admins who were not listed as project Owners would sometimes be unable to assign documents to users.
    • Users would sometimes get a “You do not have permissions” when saving incomplete new Document Fields.
    • Trying to log out of Analyse Documents/ContraxSuite from a project’s “Settings” page displays a warning modal if changes are not saved. Clicking “Cancel” on the warning modal would still log user out, and incorrectly redirect them to login page.
    • Various small bug fixes related to Permissions, including access to Document Explorer.
    • Various small bug fixes related to Grids and Column Visibility.

For more information, contact us here.