POJO File Synchronizer with Deletion Tracking.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
io42630 a39479c0bd
make boot/rest driven
2 weeks ago
doc _ tests passed 3 years ago
src make boot/rest driven 2 weeks ago
.gitignore _ fix test pt.2 3 years ago
LICENSE.md add license 4 years ago
README.md make boot/rest driven 2 weeks ago
pom.xml make boot/rest driven 2 weeks ago

README.md

Table of Contents

  1. Features
  2. Object Model
  3. Demo
  4. Issues
  5. Details
  6. TODO

Features

  • Core
    • Multiple folders can be synchronized
      • Delete operations are recorded, and synchronized.
      • For files with identical hashes, drop the modified date of the newer file.
  • Additional
    • Folders can be added to ignore List
  • Technical
    • FS/OS agnostic (Java FileChannel)

Object Model

DataRoot                a data root
\_ SyncBundle :         a bundle of directories on the FS to be syncronized.
   \_ SyncDirectory :   a directory on the FS.
      \_ SyncFile :     a file on the FS.

Flow

  • for each SyncBundle
    • for each SyncDirectory
      • check what files were CRUD,
      • propagate to other SyncDirectory of current SyncBundle.

Demo

IMAGE ALT TEXT


Issues

Detection of Concurrent Changes

Ensync has a core loop. The duration of this cycle is determined by # of files and pause.
To correctly detect a create/delete operation on different instances of a file on requires at most 5 cyles.

  * BAD
    * cycle 1 : A creates
    * cycle 2 : A deletes / B sync creates
    * cycle 3 : A sync creates
  * GOOD
    * cycle 1 : A creates
    * cycle 2 :           / B sync creates
    * cycle 3 : A ignores sync create
    * cycle 4 : A deletes
    * cycle 5 :           / B sync deletes

This means as # of files grows, we must wait longer and longer between modifying the same file.

This was somewhat addressed by switching to FileChannel, and locking all the files.
However more tests must follow.

For the best practice is not to modify different instances of the same file before having executed a core loop.

Lazy first Run (Design Choice)

If ensync initially runs on a non-empty directory it will consider the existing files as "on record", thus not "created".
Hence ensyc will not push the changes to the other directories.
This avoids an accidental push of massive file set, but means that you have to copy the files manually the first time.


Details

Record

  • Used for tracking of file deletions.
  • Located in each SyncDirectory\record.ensync
  • Contains <last edited><separator><relative file path> for each file in the SyncDirectory.

Core Loop

Sync files across directories.

alt text

Package Contents

Path Comment
doc Diagrams.
src.com.olexyn.ensync.artifacts Data Model: Maps, Directories, Files.
src.com.olexyn.ensync.Main Run from here.
src.com.olexyn.ensync.Flow Flow of the synchronization.
src.com.olexyn.ensync. Low level helper methods.

TODO

  • New routine:
    • look to burns-mail on how to create a REST-driven APP
  • Add tests.
  • Reduce disk access.
  • Add error handling. (i.e. if a web-directory is not available)
  • Track files that were modified during the loop.
    • currently writeRecord just takes from find
    • this means any changes made during the loop will be written to the Record
    • and created files are tracked by comparing Record (=old state) and State (=new state).
    • because of this it will appear as if the file created while the loop was running was already there.
    • thus the creation of said file will not be replicated to the other directories.
    • to solve this writeRecord should take the old State and manually add every operation that was performed by the loop (!= user created file while the loop was running).
  • File is created in DirB
    • Sync creates the file in DirA
    • Sync creates the file in DirB
      • this means the file in DirB is overwritten with cp for no reason.
      • implement a check to prevent this.