Corpus redundancy manager



Publisher Description



Redundancy due to cut-paste operations in text creates bias in machine learning for NLP.
This module takes a directory and produces a subset of the files in that directory (in a list) with an upper bound on similarity between two files.
Features
  • Identify copy paste redundancy in a document corpus
  • Input: a folder with text documents and similarity threshold
  • Output (a) a list of non-redundant documents (a non-redundant subset of the corpus)
  • Output (b) list of document pairs found to be redundant with the amount of redundancy for the pair
  • Python script (2.6) - tested on various Linux flavours + Windows XP/7


About Corpus redundancy manager

Corpus redundancy manager is a free software published in the Languages list of programs, part of Education.

This Languages program is available in English. It was last updated on 22 April, 2024. Corpus redundancy manager is compatible with the following operating systems: Linux, Mac, Windows.

The company that develops Corpus redundancy manager is cohenrap. The latest version released by its developer is 1.0. This version was rated by 1 users of our site and has an average rating of 5.0.

The download we have available for Corpus redundancy manager has a file size of 2.81 KB. Just click the green Download button above to start the downloading process. The program is listed on our website since 2011-05-09 and was downloaded 103 times. We have already checked if the download link is safe, however for your own protection we recommend that you scan the downloaded software with your antivirus. Your antivirus may detect the Corpus redundancy manager as malware if the download link is broken.

How to install Corpus redundancy manager on your Windows device:

  • Click on the Download button on our website. This will start the download from the website of the developer.
  • Once the Corpus redundancy manager is downloaded click on it to start the setup process (assuming you are on a desktop computer).
  • When the installation is finished you should be able to see and run the program.



RELATED PROGRAMS
Our Recommendations






BarCode2D-PNG


Click stars to rate this APP!

Users Rating:  
  5.0/5     1
Downloads: 103
Updated At: 2024-04-22
Publisher: cohenrap
Operating System: Linux, Mac, Windows
License Type: Free