Javascript compression with unicode characters

For some random reason I was making a base999 number compression function, I think it was because someone posted on sla.ckers about base 62. I wanted to see how far I could compress the numbers using a higher range of characters, then it hit me. Why not use it for js compression 🙂

You see if you convert the characters to their character code number and then extract a section of the number and convert it to a unicode character you can drastically reduce the amount of characters, provided of course your code contains enough characters as a decompression function is required.

I’ve added the three tag to Hackvertor to demo the compression. Here is a sample of code:-

eval("◮ᾥѵ٨ፍ".replace(/[^\s]/g,function(c){return c.charCodeAt()}).replace(/[3][2-9]|[4-9][0-9]|[1][0-1][0-9]|[1][2][0-6]/g,function(d){return String.fromCharCode(d)}))

The unpacking function simply gets the character codes, then the very specific regexp finds a range of characters from !-~ based on the character code number. This is because I only have one long number and they are not separated. I leave spaces intact because they don’t fall between the ranges and also it can break syntax if they are missing a semi-colon. It’s possible to reduce it further by including these characters.

So if you want to have some fun, try reducing the amount of characters compressed and see if you can create a smaller decompression function. Below is an example of the jspack tag in action:-
JS pack

Update…

Ok as Andrea pointed out this isn’t actual compression however many systems including twitter think the unicode characters are actually only 1 byte which results in longer message. So you can compress a 280 character message into 140. Sirdarckcat manage to get it down to the 50% ratio, you can send encoded twitter messages with Hackvertor. Like this:-

Encoded twitter message

3 Responses to “Javascript compression with unicode characters”

  1. Andrea Giammarchi writes:

    man, you are confusing compression with obfuscation. If you have not an algorithm or a look up table you could even increase the size, rather than compress it. Your example:
    <code>
    original = "alert(1)";
    compressed = "◮ᾥѵ٨ፍ";
    function realLength(s) {
    var c,b=0,l=s.length;
    while(l){
    c=s.charCodeAt(–l);
    b+=(c<128)?1:((c<2048)?2:((c<65536)?3:4));
    };
    return b;
    };
    alert([
    realLength(original),
    realLength(compressed)
    ]);
    </code>
    It’s 8 bytes for the original string, 13 for the compressed one … is that compression? Is that efficient? Do not mess up concept and keep trying with your good work!

    Regards

  2. Gareth Heyes writes:

    Yeah ok character compression please join the contest and create a better one 😉

    http://sla.ckers.org/forum/read.php?24,29866,29875

  3. Free JavaScript Code writes:

    very cool & good script, thank you very much for sharing.