UD.db - Unversal Dummy Database Project

General talks about EssentialPIM

Moderators: TerryRogers, Max

Post Reply
MetalDrop
Guru
Posts: 866
Joined: Sat Apr 09, 2016 10:19 pm
Been thanked: 189 times

UD.db - Unversal Dummy Database Project

Post by MetalDrop »

- - - News - - -
Initial Alpha Version:
There's things I want to change, and a lot of stuff I want to add, however I'm going to be running low on personal time again here soon and wanted to get a decent basic testing database up for the version 10 beta.

Apx~ 30MB Download size | Apx~ 16GB extracted
https://www.mediafire.com/file/v9zmdcb8 ... .1.7z/file

Notes:
- The database uses highly repetitive data for the best compression ratio.
- The database was made in 6.58 and is tested to update well to 10b1.
- - EPIM makes a backup of the database during update, and generally makes a normal backup not long after, meaning you should make plan on having around 48GB of free hard drive space.
- - - There is no v10 EPIM synchronizer yet, so using it will only get you a clean v9 database which will still be duplicated/backuped when using version 10.
- It may take a very long time to encrypt this database. Plan accordingly if you plane to test encrypted.
- I had wanted to make the database twice, once in 6.58, and again in 9.10.6 to benchmark the process, however it proved too difficult and time consuming to recreate the database in 9.10.6, whereas the entire process was fairly smooth in 6.58. The note module in 9.10.6 gave me substantial difficulties and due to that I never got around to trying to import data to the others.
UD.db v0.1.jpg

- - - Old - - -
I want to create a high quality universal dummy database for beta testers and the EPIM Dev team.

I'm looking for two things from fellow users:

#1 Stats on your biggest and most well loved databases.

I want to make a dummy database that is realistically large +20% for future proofing and not some random extreme for the sake of being extreme.

If you fancy statistics and can make a reasonable 10 year prediction about your database size that would be great too.

#2 A brief description on the layout of your data, or any special details you can think of.

Examples:
Are all your notes in one tree?
Are they all root nodes, if not what's your deepest note?
If not all on one tree, how many note trees do you have, and how many notes are on each one?
Do you have even a vague idea what the average word count per note is?
Are you using a special version of EPIM for a specific database and if so why?
Do you have data divided between multiple databases, and if so is it for performance reasons, would you merge these databases together if you could?


P/s If you don't want to share information public you can use this form:
https://forms.gle/bXHgQGENTGJxyFdz5


Thank you for your assistance.
Last edited by MetalDrop on Sun Oct 03, 2021 11:05 pm, edited 1 time in total.
EPIM Portable Pro Running/Tested On: Windows 11 Pro 64-bit US-ENG|i5-6400+Quadro P620|i7-7700K+1050ti|i7-8700K+970GTX|AMD 5600x+1080ti|16GB+RAM&NVMe SSDs
[I'm helpful and often reply to questions, however I am just a fellow user and not staff.]
a8907433
Guru
Posts: 1046
Joined: Fri Mar 12, 2010 11:57 pm
Been thanked: 169 times

Re: UD.db - Unversal Dummy Database Project

Post by a8907433 »

Good idea MetalDrop, I will deliver my data.... sometimes at the weekend.
FireBrand2026
Guru
Posts: 142
Joined: Fri May 13, 2016 4:10 am
Been thanked: 9 times

Re: UD.db - Unversal Dummy Database Project

Post by FireBrand2026 »

I have 5 main databases.

#1 Is for:
Email, 18,000 local pop3 emails, would not merge it with any other database I don't like my emails, contacts, and lumped together with my notes, tasks, passwords and calendar. I like being able to shut off my email and enjoy my personal days.

interesting stats:
Largest attachment is 30MB
I have 14 folders, with a max depth of 2.
My largest folder is "pre-EPIM" which is an archive of all my pop mail from other email clients before epim.
254 contacts
260 notes.

#2 is for:
shared documents, also would not merge this into the others because it is made to be slim for easy sharing, because of that I think it's stats are fairly useless here.

#3 is for:
Diary / Journal whatever you want to call it, I would merge this into the others, if I could encrypt it and lock it with a password while not encryption the rest of the database.

#4 is for:
General use that is not email, a journal, or shared with others openly, but is shared with family and friends.
Notes: 3,000 notes divided over 10 trees largest tree has 500 notes average word count is 2k I'd reckon. Longest is 4k, there is a reason for this that I'll explain in database #5.
Calendar: 2,000 entries.
Tasks: 500 entries, we are very good at deleting tasks as they are finished to avoid bloat.

#4 is what I'd call our daily driver it's kept at the current version of EPIM so we can enjoy all the new features, but we clear out all data all the time to keep it working nicely.

#5 is for:
Big data, this database is where me, and my family store all our large datasets, and archive all the things from #4 too when we need to clean house. It is currently locked to 6.58. After a lot of testing I decided that the speed and reliability loss was not worth the new features of newer EPIM versions for big data, and no newer version of EPIM has ever come close to 6.58 that I have test, though honestly I stopped testing new versions of EPIM with this database as they just kept getting worse.

This database has:
80,000~apx notes and is nearly 5GB in size.
Hundreds of notes more than 100,000 words long.
A few notes that are over 2,000,000 words long. (fan fiction my daughter wanted to save mostly)
It's longest tree is 10,000 items long.
I'd wager it's deepest nested item is probably around 5 levels deep.
There are loads of attachments of all sorts. (docs, pdfs, photos, archives, spread sheets...etc)

General use works very smoothly, I can easily switch between the trees, open the big notes..etc

Quick search for all fields stopped working a long time ago, but quick search titles still works fine, and we are good at keeping data sorted so we can find it by title.

Can't think of anything else that is broken other then the all fields quick search, which I can kind of understand why it doesn't work, EPIM 6.58 is old and has memory limits as a 32-bit program. Now days with the common ram size being 16~32GB of ram if EPIM was remade to hold the Database entirely in ram for quick searches I think it would be no problem to quick search a 5GB database, and since modern NVMe drives can read 5GB into ram whiten seconds, it wouldn't even need to set and hug ram.

So to answer the last part I would gladly merge #4 and #5 if EPIM was able to handle it, but until there is a major update based around getting the modern day EPIM to be able to take advantage of modern day hardware I don't see the happening.
TumbleDoor
Guru
Posts: 138
Joined: Tue Jun 21, 2016 7:19 am
Been thanked: 15 times

Re: UD.db - Unversal Dummy Database Project

Post by TumbleDoor »

Submitted my setup on the doc. As an after thought / extra comment you should try and make the database in v6.58 first that way you can also use it to benchmark EPIM versions.
a8907433
Guru
Posts: 1046
Joined: Fri Mar 12, 2010 11:57 pm
Been thanked: 169 times

Re: UD.db - Unversal Dummy Database Project

Post by a8907433 »

Here is my DB statistics:
EPIM DB management.jpg
I have only one DB. It is encrypted with a very long password, therefore I only have one, I don´t want to type my password many times a day. I use it with EPIM portable- synchronization only with EPIM Cloud to my Windows tablet (>Android here is only for testing it) or DB transfer with my USB flashdrive.
I expect my DB to double in size in the next 10 years, mainly emails (POP3) and pictures.

I use:
calendar - 70% as diary, most of the tasks shown work as simple notes for a day, the rest to remember on something
passwords - categorized in: Banking, diverse and login-data
contacts: personal and company; in personal I also save data associated to that person

MAIN USE: email-programm! Now I completely moved from Thunderbird to EPIM; Mails from the late 1990 on (imported from Thunderbird). I try to save only for me important mails, remove attachements, all mails well categoriezed in folders.

Notes: I have 3 trees: one for my main notes, including all that not belongs to the other two: one for Software data (subscriptions, login-data to software sites, registration data..... for example all data of my EPIM license itself and so on), and the last one is for pictures, well organized.
In my main-note I have less than 30 notes, some with subnotes (most of them less than 10), and many notes have tabs- up to 10. In this notes I have 1) general notes 2) some notes concerning financial data 3) notes with lists about the books I read 4) one note with subnotes describing chemical substances including chemical formula pictures 5) one note with subnotes with all my scanned documents (pdf), and perhaps the most important: a diary/journal, with a subnote for eaxch decade with tabs for each year. All notes have pictures too and are linked with one another and with the picture-tree and with calendar. Word count for each note is approx. < 1000 words
Keldi
Guru
Posts: 377
Joined: Thu Aug 23, 2012 11:42 am
Has thanked: 26 times
Been thanked: 92 times

Re: UD.db - Unversal Dummy Database Project

Post by Keldi »

 
My biggest as for items count database:
 
db1.png
db1.png (4.03 KiB) Viewed 9299 times
 
Most of my EPIM "notes" are actually saved as Appointments, to make them attached to a date in Calendar and be able to use filter on them in the table view.
Then some things, that might be considered notes, I keep in the Contacts (because of extra fields there).

That leaves Note's module only with something that needs more formatting than bold/italic/colours/..., or to be structured as a a tree. I have about 10 trees there. Rarely something goes more than 5 levels deep. And mostly it's one-leaf notes (sometimes two leaves, with the second leaf having a comment about what's on the first).
My exception for leaves number are Sticky notes. I keep from 2 to 4 notes Sticky at all times, and all with leaves from 5 to 10. Some are just to temporarily dump information into, which will be put into a proper place later. Some are 'read-only' to look up quickly some often needed information. I'd really appreciate for UD.db to come with several Stickies preconfigured in different colours and some with different colours of leaves. They sometimes got overlooked during testing.
I selected all notes in the biggest tree: 174149 words / 360 notes = roughly close to 500 words per note in average. But that number doesn't tell much, as I said, all my 'simple format' notes are inside appointments. Average word count would be much lower if I'd count them too.
Tasks are just tasks, password - just passwords, nothing special with that.
Add to all that a lot of inter-linking and tags, and that's my main database.

I use EPIM more and more as the years go. The Advanced search gave me these numbers for created items:
in 2006-2008 - less than 300 items,
in 2009 - 575 items (the first year I actively used EPIM)
...
in 2020 - 4270 items
in 2021 - 5615 items (already more than 2020, and only September just ended).

To be on the safe side I'd count 7k items per year for the future, multiply by 10, plus existing 23k items... My main database might be close to 100 000 items in ten years.


But when a note doesn't need to be synced to my phone, or to have complex formatting, or to be attached to a date in Calendar, it goes into another program (sorry, EPIM). I currently have there:
 
notes.png
notes.png (1.43 KiB) Viewed 9299 times
 
As I might consider importing some of that into EPIM in the future, it throws away some of my above calculations, adding up to 50000 notes to the current number in EPIM. And I have no idea how large it can get in 10 years. Those notes vary in size from one small paragraph to about 50k words per note. Format is from plain text to text with limited format options (links/bold/italic/underlined/mono-spaced/strikeout/bullet&numbered lists).

----------
My biggest in size database is an 'archive' with file attachments:
 
db2.png
db2.png (3.89 KiB) Viewed 9299 times
 
Task lists named based on a year. Each task has some attachment(s) and assigned to a contact (contacts aren't people but subjects/topics for several tasks). In the notes part of a task there is the whole text of the attached doc/pdf/... or OCR text of scanned document (so it's from one to several (20 something was the max) printed A4 pages of text in size, never measured it in word count).
No clue to which size this one will grow. I've been adding documents into it from past years as well as recent, so no prediction how it actually grows on a yearly basis.

----------
And finally, I have several smaller databases. But nothing special about them statistic-wise. I believe I could easily merge them into my main database without losing much in performance. But it works for me to keep some subject-specific items separately.
Post Reply