Javascript compression with unicode characters

For some random reason I was making a base999 number compression function, I think it was because someone posted on sla.ckers about base 62. I wanted to see how far I could compress the numbers using a higher range of characters, then it hit me. Why not use it for js compression :)

You see if you convert the characters to their character code number and then extract a section of the number and convert it to a unicode character you can drastically reduce the amount of characters, provided of course your code contains enough characters as a decompression function is required.

I’ve added the three tag to Hackvertor to demo the compression. Here is a sample of code:-

eval("◮ᾥѵ٨ፍ".replace(/[^\s]/g,function(c){return c.charCodeAt()}).replace(/[3][2-9]|[4-9][0-9]|[1][0-1][0-9]|[1][2][0-6]/g,function(d){return String.fromCharCode(d)}))

The unpacking function simply gets the character codes, then the very specific regexp finds a range of characters from !-~ based on the character code number. This is because I only have one long number and they are not separated. I leave spaces intact because they don’t fall between the ranges and also it can break syntax if they are missing a semi-colon. It’s possible to reduce it further by including these characters.

So if you want to have some fun, try reducing the amount of characters compressed and see if you can create a smaller decompression function. Below is an example of the jspack tag in action:-
JS pack

Update…

Ok as Andrea pointed out this isn’t actual compression however many systems including twitter think the unicode characters are actually only 1 byte which results in longer message. So you can compress a 280 character message into 140. Sirdarckcat manage to get it down to the 50% ratio, you can send encoded twitter messages with Hackvertor. Like this:-

Encoded twitter message

Fresh prototypes on all browsers

So there’s a well known technique for getting Object prototypes that are not from the current window which results in a fresh prototype. You use iframes to copy the required prototype from the iframe.contentWindow BUT…It doesn’t work in all browsers and it’s pretty silly having to copy each object manually, why not just use the window? Well you can :D

So after a lot of code testing/rewriting here is how to do it:-

var iframe = document.createElement('iframe');
iframe.style.width = '1px';
iframe.style.height = '1px';
iframe.frameborder = "0";
iframe.style.position = 'absolute';
iframe.style.left = '-100px';
iframe.style.top = '-100px';
document.body.appendChild(iframe);
var code = "(function(objConstructor){ return window.NameOfInstance= objConstructor();})(" + objConstructor+ ")";
if (window.opera) {
	iframe.contentWindow.Function(code)();
} else {
	iframe.contentWindow.document.write('<\script type="text/javascript">' + code + '<\/script>');
	iframe.contentWindow.document.close();
}
var obj = iframe.contentWindow.NameOfInstance;
if(!obj) {
	iframe.contentWindow.Function(code)();
	obj = iframe.contentWindow.NameOfInstance;
}

So here obj contains our Object instance within the context of the iframe window, that means any references to window inside your object only affect the iframe context. The reason for the if statements and different code is because Firefox, Safari, Opera and IE all act differently. Opera doesn’t pass the object straight away unless the Function constructor is used, Safari supports the Function constructor method and the document.write method but doesn’t return the object correctly when using document.write until it’s loaded.

The important part about this code is that you don’t need to use the onload event of the iframe as the object is returned instantly :)

Creating HTML listeners with JSReg and Hackvertor

JSReg has grown up a bit since I released the first version. You can now use it to monitor malicious javascript. I have a very basic example of this in Hackvertor, at the moment Hackvertor doesn’t support callbacks so it’s a bit of a hack but you will get the idea.

I use __defineSetter__ to monitor the fake document object, you see in JSReg the document object doesn’t exist it becomes $document$ but you can supply your own object in order to create a listener. At the moment the code only works on Firefox, see below for the example:-

var parser = JSReg();
var result;
parser.setDebugObjects({result: function(code){
						result = code;
						}});
var html = '';
if (window.__defineSetter__) {
	var htmlLog = function(str) {
		html += str;
	}
	var obj = {
		$write$:htmlLog,
		$body$:htmlLog
	}
	obj.$body$.__defineSetter__('$innerHTML$',htmlLog);
	obj.__defineSetter__('$innerHTML$',htmlLog);
	parser.setDocument(obj);
}
try {
	parser.runCheck();
	parser.eval(code);
}
catch (e) {
	alert(e.description||e);
}
alert('Decoding javascript...');
if(html != '') {
	result += '\nHTML:'+html;
}
return result;

So “obj” is our fake document object, I just add the properties write and body. Then I use __defineSetter__ to monitor any assignments to innerHTML. You could monitor more of course and even extend the window object to monitor eval. So how does this work in practice? Well take a look below with some fake encoded malicious javascript:-

Encoded fake javascript malware

As you can see JSReg executes the javascript safely and then uses the fake document to monitor document.write which presents you with the HTML output. This is only a basic example of how it could be used, in future I plan to allow Hackvertor to provide more detailed examination of malicious javascript.

JSReg update

Big thanks!

I’ve done lots of updates to JSReg with some fantastic help from kangax, sirdarckcat, Thornmaker and mario.

Mario found some cool parsing bugs, sirdarckcat helped with some exploits that assigned to window :) and also provided some awesome code ideas and bugs. Thornmaker found ternarys cause problems with my object detection. I’d also like to thank Achim who helped me find a recursive regexp when he tested it on other browsers. Finally kangax’s input has been great providing me with some headaches trying to match RegExps that look like comments and many other parsing bugs. Thanks a lot guys! You’ve been awesome!

A lot has changed since my last post, it’s getting closer and closer to be used in real world applications and my new version of Hackvertor :) I didn’t expect to be able to parse as much code as it currently does and manage to keep the RegExps small. I try to match as little as possible as Javascript is a complex language.

How it works

In case you don’t know, JSReg is a Javascript sandbox with a difference. It uses Javascript itself to safely parse the code using regular expressions. This means that some features are removed from the Javascript language while in the sandbox, examples of these are access to the DOM like document.body etc. and Object methods like valueOf and toString. The goal is to produce safe Javascript from a untrusted source.

To see how it works check the following example:-

a='a';eval(\u0061+'\x6c\x65\x72\x74\x28\u0034\x32\u0029');

The code assigns the letter “a” to a variable of “a”. Then the eval function is used with a unicode escape which translates to the variable “a” then it’s concatenated with various escapes to produce alert(42).

Here is the JSReg’d version:-

var $a$,$eval$;
$a$=globals.string('a');$eval$($a$+globals.string('\x6c\x65\x72\x74\x28\u0034\x32\u0029'));

So the rewriter identifies dangerous strings and converts them into safe strings. In this instance eval is renamed $eval$ which is a custom JSReg function that translates the content sent to it. All variables used are declared at the top which prevents them being assigned to the global window space. globals.string etc are a special JSReg object which defines a new prototyped version of String etc. to allow you to call whitelisted methods of the object.

Interface

That’s the basic idea of how JSReg works, the interface contains six textareas which shows the result of the JSReg evaluation. The first box is your code input, second is the JSReg conversion of your input, globals.eval contains the result of an eval operation and the code which has been rewritten, globals.function contains a similar output to eval but with Function code when calling new Function, the result returns the evaluated result after the code has been converted and the globals box at the bottom lists any global variables that might have escaped the sandbox.

Future and development

I always thought it was possible to use untrusted Javascript within Javascript itself, many other solutions had other languages as a requirement. I think JSReg is definitely getting there now after many of failed attempts. I plan to integrate sirdarckcat’s HTML parser too in future, to allow safe access to the DOM. Best of all I’m giving away this code, you can use it freely on your web site :) So please get involved! Find a exploit or a parsing error and help produce a native Javascript sandbox which is free for everybody to use.

Try out JSReg

Hidden Firefox properties revisited

This is the first time I’ve looked at the Firefox source, really! :) I wanted to find all the hidden properties Firefox has in Javascript. It was first pointed out to me by DoctorDan on the slackers forums when he found that the RegExp literal had a -1 value for the source in Firefox 2. I then made it my mission to find others because I thought it would be cool.

They seem to be flags within the source (Ronald mentioned this to me at some point too), I’m not sure how they are used internally or within Javascript. In the source code they are given the name tinyid so that’s what I’ll refer to them from now on.

Here’s how to use them:-

(function(){ alert(arguments[-3]) })()

Functions:-
CALL_ARGUMENTS = -1, predefined arguments local variable
ARGS_LENGTH = -2, number of actual args, arity if inactive
ARGS_CALLEE = -3, reference from arguments to active funobj
FUN_ARITY = -4, number of formal parameters; desired argc
FUN_NAME = -5, function name, “” if anonymous
FUN_CALLER = -6 Function.prototype.caller, backward compat

RegExp:-
REGEXP_STATIC_INPUT = -1,
REGEXP_STATIC_MULTILINE = -2,
REGEXP_STATIC_LAST_MATCH = -3,
REGEXP_STATIC_LAST_PAREN = -4,
REGEXP_STATIC_LEFT_CONTEXT = -5,
REGEXP_STATIC_RIGHT_CONTEXT = -6

REGEXP_SOURCE = -1,
REGEXP_GLOBAL = -2,
REGEXP_IGNORE_CASE = -3,
REGEXP_LAST_INDEX = -4,
REGEXP_MULTILINE = -5,
REGEXP_STICKY = -6;

E4X:-
NAMESPACE_PREFIX = -1,
NAMESPACE_URI = -2

QNAME:-
QNAME_URI = -1,
QNAME_LOCALNAME = -2

As I find more I’ll add them here, I know strings uses -1 for the length but I’ll wait till I find all of them for the string object.

New beta of JSReg

I’ve been slowly developing JSReg over the last few months and I’ve dropped lots of code and redone it many times. This latest version is a code rewriter and will sandbox most javascript properties and the goal is to produce a complete locked down version (which can be improved upon later).

So far it’s going well, I think the experience of hacking sandboxes and hacking my own code has resulted in a better version. I’d like to thank Sirdarckcat (Eduardo Vela) for testing and giving me some great suggestions.

There are a couple of bugs and limitations, at the moment arrays don’t work because of a bug in [] object syntax. I hope to fix this soon though. It also eats new lines when using functions or other stuff, you can get round this at the moment by using ; after the function declaration. The alert function is only supported at the minute but I plan to add more once the code is a bit more stable. Finally there is no DOS protection at the minute and you can probably throw objects in the global scope although you should be able to access other globals or modify them.

So can you break it? Execute code not intended like Function or maybe access global variables other than the $_ prefix and suffix allowed.

Here is a code sample that works fine:-

function x(){ var m=1; this.getM=function(){ return m; } }; y=new x; y.getM()

New JSReg!

CSP - Mozilla content security policy

This is my cup of tea, a whole new way to prevent XSS and related attacks. I’ve been looking at the specification and I like the core of the policy preventing external scripts, eval etc. But reading it I started to think of ways around it because it’s fun :)

Meta tag

The meta tag seems like a bad idea to me, if a site enforced the policy from a http header then a attacker controlled meta tag could merge policy data with an attacker’s evil policy.

Code will not be created from strings

I’m not sure what this is meant to prevent as the allowed section states it allows setTimeout, setInterval with functions as an argument. So you can do this:- setTimeout(function(){alert(1);//any code}); Or redefine existing functions, I’m not sure that preventing tainted javascript will work this way as there are many ways to obfuscate and execute code.

Abusing the whitelist

Finally my other idea was injecting javascript onto itself using a HTML page. This assumes the CSP policy allows scripts to be executed from it’s own domain. The attack also relies on the fact that you can control the output of the entire page or the output is in quirks mode with any E4X breaking code. So the vectors would work like so:-

The script is commented out when the HTML is executed because it references itself as javascript.

alert(1)//

Here the script injects itself and the resulting javascript ignores the script tag as inline e4x:-

alert(1);;

Demo’s of the vectors are available here:-
CSP1 without E4X
CSP2 with E4X

Update…

I’ve updated the vectors and made the e4x one more realistic. Here is a Firefox 3.5 version which gets round the “whole program” error by splitting the HTML and inserting a Javascript statement:-

CSP3 with e4x FF 3.5

Of course these attacks are theoretical because I’ve not actually had chance to test CSP, is there a beta? Anyway these vectors could easily be protected by enforcing script content to have the correct headers and not allow HTML data.

Minor Safari cross domain bug

I found this while writing Astalanumerator. Safari allows you to overwrite top and parent with native code and maybe other stuff (I haven’t tried). This allows you to define something on domain A and call it on domain B using the top and parent. I’d email Apple about it but the last time I reported XSS on the Apple store they ignored me.

You could use this in dom based XSS situations when you have control over a link. The attack would work like this:-

PHPIDS

But the remote site would include a iframe to the target page and refining parent/top as setTimeout or eval. You could also use “name” in this instance to provide a XSS payload.

Here is the POC for the cross domain in action, I use subdomains in this instance but any domain could be used:-

Safari poc

Asta la vista baby

A quick update to Astalanumerator, it is now much better. No crashes and a completely new interface. I use a tree menu to traverse objects which can go on forever if you wish. It uses two display windows now, one displays the tree menu and the other displays extra details about the object when clicking.

In addition it checks for window leaks on the object clicked, I use a few methods for this but if anyone has any ideas for additional checks then please leave a comment. There is no property limits at all now because of the revised interface.

New astalanumerator

New PHPIDS vector

No new PHPIDS vectors for a while? So I thought I’d write a new one as I had 5 minutes spare while drinking my coffee. I used a new technique (as far as I’m aware) to make things easier :) A very old feature in IE is to allow events to be declared as vbscript using the language attribute. This has been used in some very old code but never in XSS, it’s definitely not on the cheatsheet.

Anyway here is the vector:-

<b/alt="1"onmouseover=InputBox+1 language=vbs>test</b>

POC

You have to rollover the bold “test” on the page to execute and allow scripted windows. The errors are related to the dom injections that are not valid because it’s a HTML injection. You could get round the scripted windows dialog by using other code but I only had 5 mins.

VBScript doesn’t require () to call functions and the plus converts 1 to a number (which it already is), this is used to bypass the need to use quotes within that particular attribute.

Note the XSS Filter in IE8 catches this vector.