07 Aug 13:22

Xunius

2df9a39

Menotexport v1.5.1 Latest

Latest

Remove pandas dependency

Also fix the issue that when a field fetched is a sequence (e.g. first names), and the sequence
has duplicates, the duplicates were removed, which could result in wrong author list.

Assets 2

07 Aug 02:51

Xunius

v1.5.0

e9abd4f

Menotexport v1.5.0

Fix multiple attachment issue:
When a document has more than one attachments (e.g. supplementary materials that go along a paper), now the export puts respective highlights and notes to each attached PDF, while summarizing annotations from all attachments to the exported txt file.

Quite some changes under the hood to implement the change, resulting in cleaner code, but unfortunately not so much of speed up.

Fetch side-bar notes (aka "General notes") correctly.

Some minor fixes, including excluding the comma ":" from output folder name, as it's an illegal symbol in Windows and Mac systems.

Assets 2

10 Nov 13:40

Xunius

v1.4.4

349cb1d

Menotexport v1.4.4

add custom template
fix autorename function

Assets 2

09 Sep 08:32

Xunius

v1.4.3

a7b0ee0

Menotexport v1.4.3

Fix the tag list issue in anaconda build

Assets 2

22 Nov 06:36

Xunius

v1.4.2

15b35e8

Menotexport v1.4.2

Fix parent id 0 bug.
Fix tags as nested list bug.

Assets 2

31 Oct 11:21

Xunius

v1.4.1

58827e8

Menotexport v1.4.1

Add conda install option

Assets 2

31 Oct 09:30

Xunius

v1.4

b318822

Menotexport version1.4

Added:

Export to .bib and .ris files.
New colors in the highlighted texts in exported PDFs.
Support for nested Mendeley folders.
Bug fixes.

Assets 2

01 Apr 13:32

Xunius

v1.1

92c224d

Menotexport version1.1

Supports both command line and gui usage.
Allows specifying a folder to work on.
Improved line break detection. Fewer erroneous '......' are inserted.
Various minor fixes.

Assets 2

27 Feb 22:04

Xunius

v0.1

a306399

Menotexport command line version

Menotexport

Menotexport (Mendeley-Note-Export) extracts highlights and notes from your Mendeley database

What does this do?

Menotexport.py is a simple python solution to help extracts annotations (highlighted
texts and sticky notes) you made in the build-in PDF reader of Mendeley.

Mendeley is a desktop and web program for managing and sharing research
papers. It offers free desktop clients for Windows, OSX and Linux. But the
software is not open source, and their support team has been real slow in responding
to customers feature requests, some of which has been proposed by many for YEARS.
This tool aims at solving the following:

1. Bulk export annotated PDFs.

Annotations (highlights and notes) made inside Mendeley are saved not directly onto
the relevant PDFs, but to a separate database file. Therefore these annotations can
not be viewered in another PDF reader other than Mendeley itself.

The native but awkward solution to export a PDF with its annotations is: in
Mendeley, open that PDF in the Mendeley PDF reader, go to Files -> Export PDF
with annotations. However to export all your collections, this has to be repeated
manually for each individual PDF in your library.

This script can bulk export all PDFs with annotations to a given folder, and
the annotations are readable by other PDF softwares. NOTE that PDFs with no annotations
are not exported.

2. Extract annotation texts.

To extract texts from the highlights and sticky notes in a PDF, other than
Copy-n-Paste one by one, some softwares offer an automated solution.

skim on OSX has the functionality to produce a summary of all annotations.

Some versions of Foxit Reader can do that (on windows, not on the Linux version, not sure about Mac).

Pro versions of Adobe Reader may have that too.

Most of the PDF readers in Linux do not have that functionality. (Let me know if you find one).

This tool could extract the texts from the highlights and notes in the PDFs in Mendeley
to a plain text file, and format the information in a sensible structure.

Usage

python menotexport.py [-h] [-e] [-m] [-n] [-w] [-s] dbfile outputdir

where

-h: Show help messages.
-e: Bulk export PDFs to a folder given by outputdir.
-m: Extract markups (highlighted texts).
-n: Extract notes.
-w: Do not overwrite existing files in outputdir. Default to overwrite.
-s: Save extracted texts to a separate txt file for each PDF. Default to
save all texts to a single file.
dbfile: Absolute path to the Mendeley database file. In Linux systems default location is
~/.local/share/data/Mendeley\ Ltd./Mendeley\ Desktop/your_email@www.mendeley.com.sqlite
outputdir: folder to save outputs.
If -s, texts for each PDF is saved to Anno_PDFTITLE.txt (if both -m and
-n are given), or to Highlights_PDFTITLE.txt or Notes_PDFTITLE.txt (if
either -m or -n is given).
If not -s, save extracted texts from all PDFs to Mendeley_annotations.txt
(if both -m and -n are given), or to Mendeley_highlights.txt or
Mendeley_notes.txt (if either -m or -n is given).
If not -s, also generate another txt Mendeley_annotations_by_tags.txt where
information is grouped by tags.

Example:

To bulk export, extract and save to separate txt files:

python menotexport.py -emns dbfile outputdir

Caveats and further notes

The bulk PDF export works with quite good accuracy, most highlights and notes are
reproduced as they should be.
Note extraction works with quite good accruacy.
Highlight extraction accuracy is compromised, due to the inherent nature of the PDF
format. Not all texts are correctly extracted, and the order they appear in the output
may not be exactly the same as in the PDFs (top-down, left-right). DO proof read afterwards.
Highlighted texts from a single "block" of texts are treated as one record/entry. A "block" of
texts is a continuous chunk of texts in the PDF, could be a whole paragraph, a single
line separated from others, or a single isolated word. This ambiguity is again due to the inherent
nature of PDF format. Again proof read the results.
Citationkeys and tags are added to the extracted texts to facilitate further information
processes, both can be editted in Mendeley.
If choose to save all annotations to a single file, the programme also re-structure the extracted texts
and organize them by their tags before saving to a separate file. Pieces of texts from a PDF that isn't taged are given
a tag of @None.
Possible follow-ups one can do: re-format the extracted txts to PDFs, docs or sync into
your Evernote account, will probably implement these in a later version.

Dependencies

Developed in python2.7. Haven't tested in python 3.

It requires the following packages:

PyPDF2
sqlite3
pandas
pdfminer
numpy

It further incorporate (with minor adjustments) the pdfannotation.py file from
the Menextract2pdf project.

Platform/OS

The software is tested on Linux, should also run on Mac.
Will create a windows version later.

Versions

0.1 first release

Licence

The script is distributed under the GPLv3. The pdfannotations.py file is
LGPLv3.

Related projects

Assets 2

Releases: Xunius/Menotexport

Menotexport v1.5.1

Uh oh!

Menotexport v1.5.0

Uh oh!

Menotexport v1.4.4

Uh oh!

Menotexport v1.4.3

Uh oh!

Menotexport v1.4.2

Uh oh!

Menotexport v1.4.1

Uh oh!

Menotexport version1.4

Uh oh!

Menotexport version1.1

Uh oh!

Menotexport command line version

Menotexport

What does this do?

1. Bulk export annotated PDFs.

2. Extract annotation texts.

Usage

Caveats and further notes

Dependencies

Platform/OS

Versions

Licence

Related projects

Uh oh!