Extract PDFs from data recovered by scalpel.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Ivan Olexyn d84c1e5ff3
Update README.
5 years ago
src/app Update README. 5 years ago
README.md Update README. 5 years ago

README.md

Table of Contents

  1. About
  2. Getting Started
  3. Description
  4. Package Contents
  5. Issues
  6. Screenshot

About

mucc is a tool for processing data recovered by scalpel. It's features include:

  1. Splitting PDF files into sub-files.
  2. Deleting duplicate files, which can be used independently.


Getting Started

  • Download and extract the JavaFX SDK.
  • Add the <your path>/javafx-sdk-11/lib/as a library to your project.
  • Add --module-path <your path>/javafx-sdk-11/lib --add-modules=javafx.controls,javafx.fxml to VM options.
  • Run


Description

Retrieving Sub-Files

scalpel parses disk images for %PDF headers and %EOF footers. If max_filsize is set high, the generated files will often consist of several concatenated sub-files. Here mucc finds the nested %PDF and %EOF tags and returns the files with byte sized precision.

Deleting Duplicates

Here mucc calculates the md5 hash of each file and deletes the identical files.



Package Contents

Class Description
Artifacts Simple objects used by other classes.
Controller JavaFX class containing application logic.
Execute Issues shell commands.
layout.fxml Contains layout data.
Main Main JavaFX class. Run from here.
QuicksortMd5 Quicksort algorithm.
routines Contains higher level routines called by Controller.
Tools Simple tools used by other classes.
Write Writes to /tmp. Used for data storage.


Issues

  • %PDF tags are not parsed correctly if cat output contains multiple tabs.
  • Nested duplicates are not be deleted on first pass.
  • Code formatting, documentation and IDE warnings.
  • scalpel integration is missing.
  • States require progress indicator instead of "__".


Screenshot

UI