Javascript compression with unicode characters

By Gareth Heyes (@hackvertor)

Published 16 years 9 months ago • Last updated April 1, 2025 • ⏱️ 2 min read

For some random reason I was making a base999 number compression function, I think it was because someone posted on sla.ckers about base 62. I wanted to see how far I could compress the numbers using a higher range of characters, then it hit me. Why not use it for js compression :)

You see if you convert the characters to their character code number and then extract a section of the number and convert it to a unicode character you can drastically reduce the amount of characters, provided of course your code contains enough characters as a decompression function is required.

I've added the three tag to Hackvertor to demo the compression. Here is a sample of code:-

eval("â—®á¾¥ÑµÙ¨á".replace(/[^\s]/g,function(c){return c.charCodeAt()}).replace(/[3][2-9]|[4-9][0-9]|[1][0-1][0-9]|[1][2][0-6]/g,function(d){return String.fromCharCode(d)}))

The unpacking function simply gets the character codes, then the very specific regexp finds a range of characters from !-~ based on the character code number. This is because I only have one long number and they are not separated. I leave spaces intact because they don't fall between the ranges and also it can break syntax if they are missing a semi-colon. It's possible to reduce it further by including these characters.

So if you want to have some fun, try reducing the amount of characters compressed and see if you can create a smaller decompression function. Below is an example of the jspack tag in action:- JS pack

Update...

Ok as Andrea pointed out this isn't actual compression however many systems including twitter think the unicode characters are actually only 1 byte which results in longer message. So you can compress a 280 character message into 140. Sirdarckcat manage to get it down to the 50% ratio, you can send encoded twitter messages with Hackvertor. Like this:-

Encoded twitter message

← Back to articles

Javascript compression with unicode characters

By Gareth Heyes (@hackvertor)

Published 16 years 9 months ago • Last updated April 1, 2025 • ⏱️ 2 min read

← Back to articles

I've added the three tag to Hackvertor to demo the compression. Here is a sample of code:-

eval("â—®á¾¥ÑµÙ¨á".replace(/[^\s]/g,function(c){return c.charCodeAt()}).replace(/[3][2-9]|[4-9][0-9]|[1][0-1][0-9]|[1][2][0-6]/g,function(d){return String.fromCharCode(d)}))

So if you want to have some fun, try reducing the amount of characters compressed and see if you can create a smaller decompression function. Below is an example of the jspack tag in action:- JS pack

Update...

Encoded twitter message

← Back to articles