Jump to content

Featured Articles

Check out the latest featured articles.

File Library

Check out the latest downloads available in the File Library.

New Article

Product Viscosity vs. Shear

Featured File

Vertical Tank Selection

New Blog Entry

Low Flow in Pipes- posted in Ankur's blog


Ai For A Library

7 replies to this topic
Share this topic:
| More

#1 shvet1


    Gold Member

  • Members
  • 360 posts

Posted 01 April 2024 - 12:10 AM

Dear forummembers


I have a large library of standards, practices, articles, books, reports and similar which I use everyday. The problem is the library has been growing up to 100+ GB and 40+k files and keeps evolving. Eventually it became hard to find info/data on demand as one is remembering that it does exist and even how it seems but cannot recall where exactly. So one has to look through all relevant files which takes enormous amount of time.


The question is - are there AI enhanced tools that can help? Some software / application which a user can use for indexing by or upload files to in the purpose to use a context search. Something like Copilot or ChatGPT but for private library. 


Hope the core idea is clear. I have spent a couple of weekends googling and have found nothing.


Please guide me if this is incorrect forum for such issues.

#2 MikeCH


    Brand New Member

  • Members
  • 4 posts

Posted 01 April 2024 - 03:23 AM

For my private library, I am using paperless-ngx. Runs in a docker container on my NAS with currently > 3'000 documents. After uploading, the OCR task starts which works surpisingly good for scanned documents. I have not tried it with handwritten documents. However, the software will most probably fail with my handwriting :)


Beside the text recognition, the documents can be tagged and classified according to your own needs. After some training, the software even proposes the tags very well. Beside the full text search, there are many filters available to find the information again.


Until now, I only failed with one document: A copy of "CRC handbook of chemistry and physics". With this large book I run into a timeout. Maybe my hardware is not performant enough.

#3 Pilesar


    Gold Member

  • Members
  • 1,401 posts

Posted 01 April 2024 - 07:34 AM

I use the 'folders - subfolders - meaningful file name' method along with the MS Windows search tool. My Engineering folder has about 500 subfolders and I agree it is difficult to recall whether I have relevant files sometimes. I have always been suspicious of on line tools to keep my data safe. I keep my local hard drives backed up routinely. I will look into paperless-ngx as a possible backup to my backups. 

#4 shvet1


    Gold Member

  • Members
  • 360 posts

Posted 03 April 2024 - 11:41 PM


Thank you. I spend most time on a my corporate computer and I have no total access to OS to install such a complex software as Paperless is. A traditional software or app would be more convenient and I am curious why it was made so overcomplicated for a such simple task.

It looks like a toy for geeks.

#5 Pilesar


    Gold Member

  • Members
  • 1,401 posts

Posted 04 April 2024 - 12:09 AM

Wasn't there someone on these forums who recently complained about their trouble with organizing and retrieving the many thousands of files in their private data library and seeking artificial intelligence enhanced software solutions? That does not sound to me like a simple task that could be accomplished without complex software. I hope others offer their methods. Breizh seems to have a great system and I would like to know how he does it!

#6 shvet1


    Gold Member

  • Members
  • 360 posts

Posted 04 April 2024 - 12:27 AM

That does not sound to me like a simple task that could be accomplished without complex software. 

I am talking about installation procedure


Breizh seems to have a great system 

Or a lot of time and striving


I am just an engineer trying to take advantage of an overadvertised tool he has been ceaselessly hearing about from the every iron. All as usual - there is a noise, there is no benefits.

Edited by shvet1, 04 April 2024 - 05:31 AM.

#7 breizh


    Gold Member

  • Admin
  • 6,374 posts

Posted 05 April 2024 - 12:28 AM


My files are organized in folders and subfolders, one folder for books and another one for process (pdf and excel sheets) covering a lot of topics in chemical engineering (subfolders). More than 30 years of data.

My worry is about losing those data.


#8 Pilesar


    Gold Member

  • Members
  • 1,401 posts

Posted 05 April 2024 - 08:50 AM

My data is on an 8 TB hard drive always connected to my computer USB port. I routinely back this up (using WinMerge software) to another 8 TB drive which is also always connected by USB to the same computer. Less frequently (once a year or so) I reconcile the recent data with a third large hard drive I keep normally disconnected from my computer. This limits my risk due to hard drive failure and due to malicious virus. My data is still vulnerable to fire or theft since all my backups are physically close, but I judge that risk acceptable considering its expected remote frequency.

Similar Topics