The evolution of code
Wednesday, 6 February 2013
I’m going to try and define something that doesn’t exist and hopefully implement it in future. My long term goal is to be able to evolve code that understands itself and can mutate into new functionality. The trouble is randomized code is going to take a long long long time to evolve into something usable. My idea is to use inputs and outputs and “actions” and the length and speed of the program to evolve itself.
First we’ll call an evolvable program “eProgram”. ePrograms will mutate if their output or action is the same but the program changes to be faster or shorter. An I/O function is part of the eProgram that accepts input and produces output which we’ll call iFunction. An action function is a function that accepts input but has no output but produces an action which we’ll call aFunction. E.g. draw square. You cannot have a function that doesn’t produce an output and doesn’t produce an action because it will be impossible to verify if the mutated state contains the desired results. Each eProgram also have a malicious input which we’ll call mInput and a malicious action that we’ll call mAction. This will allow us to mutate malicious code and create variants that can be tested against to ensure our program is secure. Ideally every function within the eProgram should be able to produce every known input/output which makes me think that state machines will not be able to evolve because of the halting problem “Given a description of an arbitrary computer program, decide whether the program finishes running or continues to run forever”. It is not necessary to know all inputs/outputs of a function but desired since if we do we can ensure our evolved programs are also secure regardless how different they may become.
Our ePrograms can now evolve to produce smaller faster code by checking their actions or outputs and verify their security by checking the mInputs and variants and ensuring mActions don’t execute. It is possible that a eProgram could simply evolve it’s outputs by literally outputting it’s expected output without performing an operation. E.g. instead of doing a+b = 2 it could just return 2. It’s hard to verify this and so it’s important to include as many inputs/outputs as possible and make sure each action is verified correctly. Would literals be part of the evolution process? I’m not sure. Maybe not which would solve problem since it’s unlikely that the program will be able to mutate all it’s outputs literally without them.
Ok so our ePrograms can mutate themselves theoretically to be smaller and faster but how can a square evolve into a circle? We could merge ePrograms together to form a new eProgram that inherits all iFunctions, aFunctions, mInputs etc we could manipulate the literals of expected outputs to produce new functionality. I guess we could call this rule mutation but this area is very tricky since the rules could be mutated to cause problems. The other problem with merged programs is we do not have any way to measure their usefulness. Maybe human interaction is required here? Programs could be chosen based on a human choice. E.g. I like circles this eProgram mutation looks interesting. Somehow we need to define a desired result of a program merge maybe a merged action e.g. I want a square and circle. We also might have a desired result but not the code to produce it, somehow an eProgram would need to learn how to produce a desired result by mutating rules and ePrograms.
Lastly we could tap into existing code examples to learn about new code automatically. For example an eProgram might not know how to use canvas in JavaScript but it could spider stackoverflow for code examples until it achieves it’s desired result. Somehow we’d need to generate unknown outputs/actions to be verified against maybe existing code could be used here also? What I mean by generated outputs/actions is that an eProgram would need to know the end result such as a canvas object with a square in the middle in order to generate the same result. If those can be extracted it would mean we’d be able to evolve existing code samples into new programs or improve the speed and length and possibly even security.
Rough spec of an eProgram
eProgram = {
iFunctions:[{
input:[1,1],
output:2,
func: function(x,y) {
var result = x+y;
document.getElementById('x').innerHTML = result;
return result;
},
mInput: ['<img src=1 onerror=alert(1)>'],
mAction: function(callback) {
window.alert = function(){
callback();
}
}
}],
aFunctions:[{
action:function() {
var span = document.createElement('span');
span.id = 'x';
document.body.appendChild(span);
},
vAction:function() {
if(document.getElementById('x')) {
return true;
} else {
return false;
}
}
}]
};
To illustrate the concept in simple form I did a code example a while ago where randomization could reduce the size of the code based on a known output. Mutation minify uses random positions in the code to remove and verifies that it’s output doesn’t change and verifies an action that the code’s syntax is correct. What’s interesting about this basic example is the output of code can be different based on the random positions it decides to remove so even at a basic level we have a program deciding to take two paths. From an evolution standpoint we could compare these two decisions to find out which one is best and resolve the method needed to produce it.
Update..
Talking with Mario Heiderich about this and we came to the conclusion that in order to evolve ePrograms with new functionality without human interaction you’d have to manipulate the output of iFunctions and randomize the objects from aFunctions. This I will call iFunction manipulation and aFunction manipulation. As mentioned in the article it might be tricky to do something like this since you could produce an invalid output perhaps we could create a similar output to avoid this or somehow use an invalid output as a factor to determine fitness of a eProgram.