I have started a new open source project called Destral. It is a command-line utility to split and join files, much like Hacha and HJSplit.

The main advantages of Destral over Hacha and HJSplit are:

  • Multiplatform
    It is written in pure C, therefore it should build in every operating system with a C compiler.
    This single utility works the same for Linux, Windows, Mac, etc, forget about using a different utility in each operating system. Same use, same flags, same everything.
  • Destral is able to split and join using Hacha 3.0, Hacha 3.5 and HJSplit formats. To state it clearly: Destral does not use a new split and join algorithm. It does not need Hacha 3.0, Hacha 3.5 or HJSplit to work, I have implemented the algorithms.
  • Destral is intelligent and uses sensible defaults.
    Most times you will not need to tell it what split and join algorithm you want to use, it will discover.
    For instance, when you want to join several chunks in a file you just run destral -j myfile.0, or destral -j myfile.000, or destral -j myfile.001 (at this moment you need to provide it with the path for the first chunk, but this weekend I will make it intelligent enough to search for the first chunk if you pass, for instance, chunk #3).

There is no release yet, if you are interested you will need to access the code via Subversion. The only dependency besides a C compiler is CMake, but it’s possible and easy to build it without CMake.

Current features:

  • Join Hacha 3.0, Hacha 3.5 and HJSplit/lxsplit files (no CRC check in Hacha files yet)
  • Multiplatform
  • It works and is very fast

Known bugs: there is an issue I just discovered with the names of the joint file under certain conditions, I will fix this soon.

Future features:

  • Fix bugs
  • Implement splitting of files, with sensible defaults: Destral will automagically select certain chunk sizes depending on the input file (it will be possible to override that using parameters).
  • GUI
  • CRC reverse engineering (the Hacha developer does not answer my e-mails, so I have no information about the CRC algorithm he is using)

In Spanish-speaking forums and websites a lot of people use Hacha (a win32-only app) to split a large file into several smaller chunks. English-speaking people prefer HJSplit, which has a Linux version called lxsplit.

On one hand, I cannot understand why people keep using these programs as you could just use a compressor (WinZip, WinRAR) and set the compression ratio to zero: it would be as fast as Hacha and HJSplit and everybody already has WinZip and/or WinRAR. On the other hand, I cannot change people’s mind and using wine to run Hacha is a pain in the ass in my 64-bit KUbuntu (32-bit chroot, yadda, yadda).

I have tried to contact the author of Hacha to no avail. I suspected the algorithm was easy but I like to play nice: I kindly requested information about the algorithm Hacha is using to split files. After some weeks without an answer, tonight I gave KHexEdit a try and you know what? I was right: the split & join algorithm in Hacha 3.5 is extremely simple.

There is a variable-length header which consists of:

  • 5 bytes set to 0x3f
  • 4 bytes CRC. If no CRC was computed, CRC is 1 byte set to 0x07 followeb by 3 bytes set to 0x00. If CRC was computed, its 4 bytes are here. I have not discovered the CRC algorithm yet.
  • 5 bytes set to 0x3f
  • Variable number of bytes representing the filename of the large file (before splitting/after joining). This is plain ASCII, no Unicode involved.
  • 5 bytes set to 0x3f
  • Variable number of bytes representing an integer which is the size of the large file (before splitting/after joining). Let’s name it largeFileSize.
  • 5 bytes set to 0x3f
  • Variable number of bytes representing the size of each chunk except the first (the one which ends with ".0") and the last. Let’s call it chunkSize. The size of the first chunk is chunkSize + headerSize. The size of the last chunk is largeFileSize – (n-1)*chunkSize.
  • 5 bytes set to 0x3f

And that’s all you need to know if you want to implement the Hacha 3.5 algorithm. I will be doing that in the next few days and releasing this program under the GPL.

Update I had not realized there is CRC information. The information I had here corresponds to the trivial case (no CRC), but I’m yet to find out the CRC algorithm. Reversing CRC – Theory and Practice seems a good starting point.

Yesterday I released new versions of two of my open source projects: Javascript Browser Sniffer and XSPF for Ruby.

Version 0.5.1 of jsbrwsniff fixes the detection of Flash Player. It was not working with Flash Player >= 7.0 in Gecko browsers and not working at all in Internet Explorer due to a typo.

Version 0.3 of XSPF for Ruby is able to export XSPF playlists to M3U, SMIL, HTML and SoundBlox. These are minor features, but as a new dependency has been added (Ruby/XSLT), the version number has been upped to 0.3 instead of 0.2.1. I am currently developing the XSPF generator.

After reading Penguin.swf, it looks like the reason for Adobe not to open source the Flash Player is third-party codecs (On2, Sorenson, etc).

I’d like to shoot in the air. Maybe I’ll catch a bird.

I guess Flash Player does not re-implement those codecs but uses it as an external library. Therefore, a possible solution would be:

  1. Adobe releases the Flash Player source code that belongs to Adobe, not the third-party libraries. How is this going to benefit Adobe and Flash Player in general? A lot more people would hack in the Flash Player source, improving it.
  2. Adobe defines the API to access and use these third-party libraries or even a general API for codec access. Nobody modifies those source files because if someone breaks that API, Flash would not work with third-party codecs.
  3. Flash Player may be compiled without those third-party libraries. Whenever Flash Player tries to play a Flash movie that needs codec X, it searches the local computer for the codec. If codec X is not installed, Flash Player downloads it from Adobe as a binary. Something like what the Microsoft Windows Media Player does. How is this going to benefit Adobe and Flash Player in general? Whenever Adobe decides On2 VP6 is old and wants to use On2 VP7, no new version of Flash Player is needed: just download the new codec. Using a little part of wine/darwine (just like mplayer does), this method would work in every platform.

In short, what I’m proposing is a limited-capability, full open source Flash Player and third party codecs being downloaded from Adobe as needed. After a couple of movies, everybody would have the codecs they need (unless they choose not to install them, but that’s their option).

Update There was a comment on Reddit saying wine only runs on Linux-x86 and OSX-x86. Wrong. Wine runs at least on Solaris (x86 and Sparc), Linux (x86, Sparc and PowerPC), OSX (PPC and x86) and FreeBSD (at least, x86).

Everybody is talking about Novell’s decision to move from KDE and Qt to Gnome and Gtk. Me too.

My point: Novell is stupid. Plain and simple. Very stupid.

Gtk is ugly to develop with, inconsistent, lacks a lot of functionality and it is a complete joke for multi-platform development.

KDE is so superior to Gnome, the next version of Novell Desktop will be a joke. Kiosk in Gnome? No. Integration and consistency rather than a collection of non-cooperating Gtk tools? No. Lots of advanced software? No.

People say the reason behind the move from KDE to Gnome is the Qt license (pay for commercial use). What a joke. Qt is so superior to Gtk it pays for itself so soon you will never regret buying it. A Qt license is worth half the pay of one developer for one month. Your company will recover that money immediately.

Had Suse used Gtk instead of Qt, Novell would be firing twice the people they are firing now. And the movement from KDE to Gnome is so stupid they are firing theirselves on the foot.

Bye, bye, Novell, you had the best (Suse Linux, ZenWorks and eDirectory) and you decided to suicide. You can thank Miguel de Icaza, Nat Friedman and those Ximian people. This reminds me of Netscape & Collabra.