The Spanner logo
    • Home
    • Blog
      • Blog home
      • RSS
    • Login
    • Home
    • Blog
      • Blog home
      • RSS
    • Login
    The Spanner logo

    The Spanner
    Web security blog

    Made by Gareth Heyes
    Follow me on Twitter: @garethheyes

    Javascript for hackers!

    Hackvertor logo
    Shazzer logo
    My Github account
    Recent posts
    Introducing Feedworm: A Privacy-First RSS Reader That Lives in DevToolsSpeedy RSVP extensionAutoVaderHackvertor history and tag finderShadow Repeater v1.2.3 releaseBurp Hackvertor v2.1.24 releaseHacking roomsXSSing TypeErrors in SafarivalueOf: Another way to get thisMaking the Unexploitable Exploitable with X-Mixed-Replace on FirefoxThe curious case of the evt parameterCSS-Only Tic Tac Toe ChallengeRewriting relative urls with the base tag in SafariBypassing DOMPurify with mXSSNew IE mutation vectorHow I smashed MentalJSMentalJS DOM bypassAnother XSS auditor bypassXSS Auditor bypassBypassing the IE XSS filterUnbreakable filterMentalJS bypassesmXSSJava SerializationBypassing the XSS filter using function reassignmentRPOSandboxed jQueryX-Domain scroll detection on IE using focusEpic fail IEnew operatorDecoding complex non-alphanumeric JavaScriptHacking FirefoxDOM ClobberingBypassing XSS AuditorThe evolution of codeNon-Alpha PHP in 6-7 charsetTweetable PHP-Non AlphaMentalJS for PHPOpera x domain with video tutorialSandboxing and parsing jQuery in 100ms

    Javascript compression with unicode characters

    By Gareth Heyes (@hackvertor)

    Published 16 years 9 months ago • Last updated April 1, 2025 • ⏱️ 2 min read

    ← Back to articles

    For some random reason I was making a base999 number compression function, I think it was because someone posted on sla.ckers about base 62. I wanted to see how far I could compress the numbers using a higher range of characters, then it hit me. Why not use it for js compression :)

    You see if you convert the characters to their character code number and then extract a section of the number and convert it to a unicode character you can drastically reduce the amount of characters, provided of course your code contains enough characters as a decompression function is required.

    I've added the three tag to Hackvertor to demo the compression. Here is a sample of code:-

    eval("◮ᾥѵ٨ፍ".replace(/[^\s]/g,function(c){return c.charCodeAt()}).replace(/[3][2-9]|[4-9][0-9]|[1][0-1][0-9]|[1][2][0-6]/g,function(d){return String.fromCharCode(d)}))

    The unpacking function simply gets the character codes, then the very specific regexp finds a range of characters from !-~ based on the character code number. This is because I only have one long number and they are not separated. I leave spaces intact because they don't fall between the ranges and also it can break syntax if they are missing a semi-colon. It's possible to reduce it further by including these characters.

    So if you want to have some fun, try reducing the amount of characters compressed and see if you can create a smaller decompression function. Below is an example of the jspack tag in action:- JS pack

    Update...

    Ok as Andrea pointed out this isn't actual compression however many systems including twitter think the unicode characters are actually only 1 byte which results in longer message. So you can compress a 280 character message into 140. Sirdarckcat manage to get it down to the 50% ratio, you can send encoded twitter messages with Hackvertor. Like this:-

    Encoded twitter message

    ← Back to articles