Javascript compression with unicode characters
Saturday, 15 August 2009
For some random reason I was making a base999 number compression function, I think it was because someone posted on sla.ckers about base 62. I wanted to see how far I could compress the numbers using a higher range of characters, then it hit me. Why not use it for js compression 🙂
You see if you convert the characters to their character code number and then extract a section of the number and convert it to a unicode character you can drastically reduce the amount of characters, provided of course your code contains enough characters as a decompression function is required.
I’ve added the three tag to Hackvertor to demo the compression. Here is a sample of code:-
eval("◮ᾥѵ٨á".replace(/[^\s]/g,function(c){return c.charCodeAt()}).replace(/[3][2-9]|[4-9][0-9]|[1][0-1][0-9]|[1][2][0-6]/g,function(d){return String.fromCharCode(d)}))
The unpacking function simply gets the character codes, then the very specific regexp finds a range of characters from !-~ based on the character code number. This is because I only have one long number and they are not separated. I leave spaces intact because they don’t fall between the ranges and also it can break syntax if they are missing a semi-colon. It’s possible to reduce it further by including these characters.
So if you want to have some fun, try reducing the amount of characters compressed and see if you can create a smaller decompression function. Below is an example of the jspack tag in action:-
JS pack
Update…
Ok as Andrea pointed out this isn’t actual compression however many systems including twitter think the unicode characters are actually only 1 byte which results in longer message. So you can compress a 280 character message into 140. Sirdarckcat manage to get it down to the 50% ratio, you can send encoded twitter messages with Hackvertor. Like this:-
No. 1 — August 15th, 2009 at 11:47 am
man, you are confusing compression with obfuscation. If you have not an algorithm or a look up table you could even increase the size, rather than compress it. Your example:
<code>
original = "alert(1)";
compressed = "◮ᾥѵ٨á";
function realLength(s) {
var c,b=0,l=s.length;
while(l){
c=s.charCodeAt(–l);
b+=(c<128)?1:((c<2048)?2:((c<65536)?3:4));
};
return b;
};
alert([
realLength(original),
realLength(compressed)
]);
</code>
It’s 8 bytes for the original string, 13 for the compressed one … is that compression? Is that efficient? Do not mess up concept and keep trying with your good work!
Regards
No. 2 — August 15th, 2009 at 12:48 pm
Yeah ok character compression please join the contest and create a better one 😉
http://sla.ckers.org/forum/read.php?24,29866,29875
No. 3 — August 21st, 2009 at 3:06 am
very cool & good script, thank you very much for sharing.