Carl the Zealot

May 30, 2008

Genetic Algorithms From Scratch: Harvesting the fruits of not-so-natural selection

Filed under: Code, Free Thought, Open Source, Science — Carl Myers @ 8:44 pm

There are people in this world who “don’t believe in evolution”. There are also people in this world who don’t “believe” the earth is round, or more than 6000 years old. In case any of you were wondering, I consider those two things to be equally well-proven by the evidence I have personally seen (that it to say – very well proven). There are some who would argue “have you personally seen this evidence? have you personally seen those experiments? Did you verify them yourself?”. Of course, that is a ridiculous and slippery slope. I’ve never seen my own heart, but I believe it’s there, dutifully pumping blood through my veins and arteries. Likewise, if some huge fraction of the scientific community accepts some result, I do as well, until evidence to the contrary is brought up.

Why this line of reasoning now? Where is Carl going with this? Well, it all started a few years ago, when I read an amazing article in Scientific American magazine, in the January 2003 issue. The article was called “Evolving Inventions”, and it was about creating software and hardware using genetic algorithms. It made me think “It’s hard enough to doubt evolution, genetics, and natural selection, when we find old bones and genetic markers and other highly-specialized evidence in a wide range of very specialized fields… but it’d be neigh impossible to doubt if you could see it work before your eyes.” The thing is, by definition, natural selection of humans (and most living things besides tiny bacteria) operates on geological scales, beyond the human imagination in most respects. But this article had the solution – Use a genetic algorithm to create complex things – hardware or software algorithms – from the simple, or even nothing. Surely, after seeing that before their eyes, nobody could doubt. Not to mention, icing on the cake, genetic algorithms proved to do a very good job of designing algorithms, according to the article. How potentially useful!

I learned several facts about “genetic algorithms” from this article. Here are the essential bits:

  • The three critical parts of evolution are mutation, sexual recombination, and selection – optimally all three must be present.
  • The most important genetic operation is sexual reproduction, or crossover,
    which mates pairs of the better organisms to sire offspring composed of genetic
    material from the two parents.
  • In addition to sexual reproduction, genetic programming copies about 9% of the “fittest” individuals in a population unaltered into the next generation, which generally ensures that the best organisms in each
    generation are at least as fit as those of the previous generation.
  • About 1 percent of the members of a new population undergo mutation

With that list of requirements, I felt ready to start thinking about my own implementation for running evolutionary experiments in programming. One phrase I really liked which the article used was “Think of it as a creative search through the space of all possible [objects].” In our case, we will be working on perl snippets that “do something”. So, we are trying to perform a very “creative” search through the space of all possible chunks of Perl code – a large space indeed. What we need to do to make this project useful is to abstract many of the specifics away.

What will we call the “object” we are selecting on? For convenience, I am going to call it a “fragment”. Additionally, I am going to say this fragment is a string of code which we can evaluate. The fragment can accept an object (in Perl, a hashref) which can have zero, one or more arguments in it. Additionally, the program can return a hashref with zero, one, or more arguments. Here is an example fragment which ignores its inputs and returns an output of the string “Hello, World!\n”.

my $fragment = 'my $output = "Hello, World!\n"; return { output => $output };';
my $code = sub { eval $fragment };
my $inputs = {};
my $results = $code->($inputs);
printf("Results: " . Data::Dumper($results) . "\n\n\n");

Here is an example of a fragment which takes the argument “input” and returns the result “input^2″:

my $fragment = 'my $input = shift->{'input'}; return { 'output' => $input * $input };'

With the “API” somewhat clearly defined now, we come to our three critical parts. How do we “implement” mutation, sexual recombination, and selection? First, let’s consider mutation. Obviously, mutation could mean doubling a line – having a statement execute twice – or removing a line. Mutation could mean making a copy of a variable, or changing an arithmetic expression to be slightly different. Mutation in the real world could mean the organism dies with 100% probability, which in this case would be represented by a syntax error. Since there is little point in keeping around a program with a syntax error, another mutation is unlikely to fix that error, we should make sure mutations always produce a fragment which “compiles”. Runtime errors are probably ok, they will just be selected against heavily.

The real problem posed by mutation is how do we exercise new constructs not represented in our original programs? For example, if we were selecting for a fragment which took two arguments,

{ 'a' => [int], 'b' => [int]}

, and returned the result

{ output => [a ^b] }

, but we started with the hello world program, we’d have a pretty long way to go. We’d somehow have to get the line

"my $a = $_->{'a'};"

, and a similar one for b. We’d somehow have to get the line

"my $result = $a ** $b;"

, but the ‘**’ (binary exponentiation operator) almost never shows up in practice. For now, my solution will be to implement a module which generates random mutations. mutations will affect a random line in the fragment, will have a chance of containing one or more variables mentioned on a previous line with probability proportional to it’s distance away in lines, will have a chance of defining a new variable, will have a chance of containing an arithmetic expression, will have a chance of calling a built-in keyword (shift, pop, push, first, reverse, splice, etc…) with reasonable arguments, will have a chance of creating a loop around the next N lines where N is randomly chosen such that the probability distribution of N is approximately 1/x (big N is less likely). The probabilities of everything in this module should be easy to tweak, and I’m sure I will tweak them. The following parameters should be configurable, at least, along with my “guess” for what the initial value will be:

  • Probability of any mutation happening – 0.01

If a mutation occurs:

  • Probability of a line being repeated – 0.15
  • Probability of a line being removed – 0.15
  • Probability of inserting a random generated line – 0.70

If a random line is generated:

  • Probability a built-in is called – 0.10
  • Probability a new variable is defined – 0.10
  • Probability of a new loop being added over the next N lines – 0.10
  • Probability one or more existing variables are referenced – 0.70

If previous variables are referenced:

  • Probability a variable is incremented or decremented – 0.10
  • Probability a variable is assigned to – 0.10
  • Probability an arithmetic expression is assigned to – 0.10

Any time a random expression is needed:

  • Probability expression is ‘1′ – 0.20
  • Probability expression is ‘0′ – 0.20
  • Probability expression is ‘undef’ – 0.20
  • Probability expression is a previously mentioned variable – 0.20
  • Probability expression is [expr1] [op] [expr2] where op is a random binary operator and expr1 and expr2 are randomly generated expressions – 0.10
  • Probability expression is [expr1] [op] or [op] [expr1] where op is a random unary operator and expr1 is a randomly generated expression – 0.05
  • Probability expression is [expr1] [op] [expr2] [op] [expr3] where [op] is a random trinary operator and expr1, expr2, expr3 are randomly generated expressions – 0.05

Ok, now that nasty mutation is out of the way, we have our next difficult task to design: Sexual Recombination. In humans, every human has two genes, and each offspring gets one of the two from each parent, making for only 4 combinations on each gene. The problem is, the basic unit of programming in Perl is the statement, but there is no clear way to make statements in two programs “line up” in a way that will make sense. If an offspring gets a line which references a variable from parent A, but not the line which declares such a variable, bam, syntax error – stillborn fragment. How do we combine two fragments in a meaningful way?

I’m afraid the simplest way is to “do it a lot” with small chunks, and hope the correct things randomly get selected over time. For example, instead of combining two programs into one program made up of equal parts of the two parents, the “best” parent will be copied, then a random chunk of the “second best” parent will be added to the copy, and the copy may or may not have a similar sized chunk of it removed to “make room”. For example:

my $fragmentA = 'my $args = shift; my $a = $args->{\'first\'}; my $b = $args->{\'second\'}; my $result = $a + $b; return { \'result\' => $result };';
my $fragmentB = 'my $args = shift; my $a = 5; my $b = $args->{\'first\'} + $a; return { \'first\' => $b, \'second\' => $a };';

Here you can see program A is a simple program which returns the sum of arguments “first” and “second”. Program B is a simple program which takes one argument and returns (”5+first”, 5). Now imagine they were combined, and statement 4 of program B was randomly put into program A before the last statement:

my $fragmentA = 'my $args = shift; my $a = $args->{\'first\'}; my $b = $args->{\'second\'}; my $result = $a + $b; return { \'first\' => $b, \'second\' => $a }; return { \'result\' => $result };';

The last statement is now ignored and the function of this fragment has totally changed. It now reverses the arguments passed to it – it is a simple swap() function! The chances of any random sexual combination like that happening “just right” is pretty small, so we will have to make sure this is tried a LARGE number of times.

For those keeping score, we now have “not-even-psuedo-code” solutions, as I like to call them, for two of our big three problems. The remaining problem is selection – how do I “select” fragments of code – how do I compare one to another? Each algorithm I am trying to develop is going to have to have certain tests to “guide” the selection index towards what we want – but there are some general aspects of a fragment which are universal too.

  • Like all good code – shorter is better. For two fragments which otherwise perform identically, but for which one fragment is shorter than the other in code length, the shorter one should win – but for two fragments where one significantly outperforms the other, the less efficient/correct one should always loose out, no matter how short it is.
  • If you try to evaluate code with a syntax error, it dies the same way a runtime error dies. It’d be nice to make a syntax error a “worse problem”, something we select against more strongly than code which sometimes gets runtime errors, but generally either of these conditions are something to select against.
  • Obviously, another “external test” we can do is track runtime. An algorithm which runs more quickly is preferred to one which is slow. As with length, correctness is far more important than runtime, but correctness and other factors being equal, faster runtime should win.
  • Simpler code is also better code – all other things being equal, code with fewer deeply nested scopes, fewer branch/goto statements, or fewer statements, should be preferred.
  • If an easy way can be found to detect memory footprint, smaller footprint should be selected for as well.

Obviously, besides external generic metrics to select against, we need some problem-specific metrics.

  • If the fragment is supposed to use arguments to accomplish its task, then the initial programs should read in the arguments and any program which doesn’t use/read all of the arguments should be selected against.
  • If the fragment is supposed to return certain results, then the initial programs should return undefined, but vivified arguments for those results, and any program which doesn’t return a hash with those arguments at least vivified should be selected against.
  • Any fragment which returns the correct “type” of data should be selected for. (I.E. a number where you expect a number)
  • Any fragment which returns something “close” to the correct answer should be selected for – for each expected output there should be some way to compare answers to expected and a weighted score should be possible which improves as the fragments return closer to the correct results.

Well, I mostly wanted to get my ideas written down before I started trying to implement this mess. Hopefully this post accomplishes that and organized my thoughts a bit. Now, to the implementation grind! =)

May 27, 2008

Coming Soon: Thoughts on Genetic Algorithms

Filed under: News — Carl Myers @ 2:14 am

I’ve always been interested in genetic algorithms – but how hard is it really to do something useful with that idea? I haven’t met the problem I couldn’t write at least a partial solution for with a couple hours of fiddling in Perl, so it will be interesting to see what I come up with. Stay tuned, folks!

The Future of Artistic Creations

Filed under: DRM, Future, Music, Open Source — Tags: , , , — Carl Myers @ 2:06 am

It’s the question on everyone’s minds and blogs these days. How are artists going to “cope” with the Internet? When is the RIAA going to stop suing grandmothers and twelve-year-old girls? How are artists going to make a living when the music they produce is practically free? Why would anyone even bother to make art in a world like that? As an amateur musician, and a professional software engineer, I think I have a pretty good idea of how the future of the music industry, and other art industries, might look. The effected industries include music and art primarily.

The coming change isn’t hard to see if you just look at the history of the music industry. The details that follow are largely taken from Cory Doctorow’s Microsoft Talk, which is available in that linked text format as well as a very entertaining video on Google Video. As Cory details in his talk, one early example of change the music industry had to cope with was player pianos.

The player piano was a digital recording and playback system.
Piano-roll companies bought sheet music and ripped the notes
printed on it into 0s and 1s on a long roll of computer tape,
which they sold by the thousands — the hundreds of thousands –
the millions. They did this without a penny’s compensation to the
publishers…

The publishers asked Congress to ban the piano roll and to create
a law that said that any new system for reproducing music should
be subject to a veto from their industry association. Lucky for
us, Congress realized what side of their bread had butter on it
and decided not to criminalize the dominant form of entertainment
in America.

But there was the problem of paying artists. The Constitution
sets out the purpose of American copyright: to promote the useful
arts and sciences. The composers had a credible story that they’d
do less composing if they weren’t paid for it, so Congress needed
a fix. Here’s what they came up with: anyone who paid a music
publisher two cents would have the right to make one piano roll
of any song that publisher published. The publisher couldn’t say
no, and no one had to hire a lawyer at $200 an hour to argue
about whether the payment should be two cents or a nickel.

This same initial fear was repeated when *gasp* evil dangerous radio became popular.

This story repeats itself throughout the technological century,
every ten or fifteen years. Radio was enabled by a voluntary
blanket license — the music companies got together and asked for
a consent decree so that they could offer all their music
for a flat fee. Cable TV took a compulsory: the only way cable
operators could get their hands on broadcasts was to pirate them
and shove them down the wire, and Congress saw fit to legalize
this practice rather than screw around with their constituents’
TVs.

Each time technology jumped further, a more significant change in business model was needed to cope.

Jack Valenti, the mouthpiece for the motion-picture industry,
told Congress in 1982 that the VCR was to the American film
industry “as the Boston Strangler is to a woman home alone.”

But the Supreme Court ruled against Hollywood in 1984, when it
determined that any device capable of a substantial
non-infringing use was legal. In other words, “We don’t buy this
Boston Strangler business: if your business model can’t survive
the emergence of this general-purpose tool, it’s time to get
another business-model or go broke.”

Hollywood found another business model, as the broadcasters had,
as the Vaudeville artists had, as the music publishers had, and
they made more art that paid more artists and reached a wider
audience.

This is why this blog’s opening question is so important. What is the “next business model”? Will customers really settle for “renting music” rather than owning it? I think not. Rental means “now you have it, now you don’t”, and without involving physical media (which is SO 1994) that means DRM. DRM is doomed to failure for all the reasons Cory lists in his talk, which I won’t repeat here.

So what will the future look like? For all intents and purposes, music will be free. That’s right – free. Not free as in beer, free as in no cost. In case you’ve been living under a rock, here’s an update: right now, someone could make a CD of anything they had already digitally recorded for about 10 cents. What about making that digital recording? Well, I bought an electronic pickup for my cello for $250. Now I can record my cello playing with a $600 computer, and edit the sound files with free and open-source software (like Audacity). Someone could hire a sound engineer (someone with a 2-year degree or 4 years of experience) to do studio quality recording with a couple thousand dollars of equipment. Compare this to 20 or 30 years ago, when you had to pay a couple thousand dollars per hour in a recording studio.

Today music publishers and their promoters spend tens of millions of dollars trying to convince people how great Brittany Spears’ newest album is. Maybe if we need that much convincing, she isn’t such a good singer after all. Maybe music should be a free market. Maybe musicians should put all of their music up for free – and do whatever it takes to get it heard (or, at least have the option to do so if they wish, as opposed to the current restrictive contracts they are all but forced to sign, giving away rights to their own music).

Maybe having large numbers of fans is better for a musician than having fans who already paid them 15$ for a copy of their latest album. Maybe if anyone could listen to a musician’s music for free, ten times as many people would listen than if they all had to pay for it. Then, just maybe, if only 10% of the people that listened to a musician’s music donated a bit, musicians would break even with the other model. But, just maybe, if 20% of the people that listened donated some money, musicians might even do better. And musicians still could go on tours, perform live, and keep a lot more of the proceeds from those concerts with no record label middle man taking their cut.

What this idea suggests is that the next business model may be a return to the *very* old ways. Long ago, artists made art, be it visual or auditory or others, because they loved to. It served no obvious purpose back when people were painting it on cave walls, or playing in primitive town squares, but they did it just the same. But buried deep in the moral code of people is the “golden rule”, do unto others as you would have them do unto you. And, for that reason if no other, when someone is truly moved by a street musician, they often donate money or praise, or both. There was no way to sell music before the Internet, before MP3s, before recordings, before sheet music. People were paid for performances – and sometimes, not even that. At first, most likely, musicians were only paid “as thanks”, after the fact, not unlike a street performer today.

Web comics provide some modern-day evidence for the viability of this model. Many web comic authors, as their primary and only career, produce free web comics from the daily to weekly range. They put their art on the web for all to see, free of charge, often under a Creative Commons license or similar terms. They subsist via many methods. One such source of income is the same way Google all but prints money – via ads – although their earnings from that are probably not significant. Most of their income comes from their customers, to whom they “give away free” all the fruits of their labors – how? They sell T-shirts, bumper stickers, prints, coffee mugs, and other paraphernalia on their website stores. Also, almost without fail – web comic authors have “click here to donate” links. Finally, web comic authors go to conventions – the “concert tour” of web comics, where they can draw sketches for their fans, meet them directly, and sell their wares as well.

Comic Merchandise? Ads? Donate Link?
Questionable Content Yes Yes Yes
Dresden Codak Yes Yes Yes
Dr. McNinja Yes Yes Yes
Penny Arcade Yes Yes Yes
VG Cats Yes Yes Yes
Ctrl-Alt-Del Yes Yes No
Xkcd Yes No No
Darths and Droids No No No

As you can see, there are plenty of examples here of folks making a career of “giving away” their art – they are supported by their loving fans, via donations, ads, and someone just from selling related merchandise. Also note that these comics are not “financed”. Arguments such as “without the record labels, nobody would make music – nobody could afford to” might have parallel arguments for free web comics. They aren’t quite as expensive as putting together an album, but there is still quite a bit of startup cost. A comic author needs a web server or web host – a website design – drawing equipment and a high-resultion scanner, or a nice tablet and related equipment. Some free software might be used, like GIMP, but many would prefer a license for Adobe Photoshop (easily $400-600). To get the word out, they might take ads on similar web comic sites which already have a following. These comics all started as hobbies initially, grew beyond that, then were finally big enough to support their creators and become their career. The fans decide when an artist is that good, not some record label. Nobody has to be “discovered” and no record exec gets to decide who is “good enough”. Clearly, web comics are perfectly successful without big contracts or funding.

Just in case some readers remain unconvinced of the reality of fan support of artists, I will now relate an event I personally witnessed. The names have been withheld to protect the affluent, but the comic name will be revealed to pay homage to its talented creator. At the 2007 Emerald City Comicon I was particularly excited to meet the author of my favorite comic, Questionable Content. Little did I know, my friends were even more excited to meet the author of a comic I had not yet been exposed to called Dresden Codak, by Aaron S. Diaz. When I say “excited”, what I actually mean is “full of dollars”. I called one friend who was not present and told him “Hey, the Dresden Codak guy is here, do you want me to buy you a print?” He replied, “I want you to buy every single item he has, until you run out of cash. Then I want you to take those items, and do whatever you want with them. Or give them back to him. I don’t care. Give him all your money, I’ll pay you back.” I called my other friend who I knew was a reader of the comic, who said “I’ll be right there.” A few hours later, about 45 minutes before the end of the convention, he arrived and made a beeline for the “Dresden Codak Guy’s table”, and immediately wrote him a check for a large amount. Over $100. He bought a $30 ticket to the con so he could find the guy and give him the check personally. When the guy offered my friend some merchandise he replied “No thanks…but can you show me the next comic? I must have it. Is it done yet?”. It was not done, but my friend insisted the guy take the check, and finish the comic soon. My friend didn’t even so much as give the guy his email address, he expected nothing in return, but hoped the comic creator would feel appreciated and continue his work.

This is the extent to which fans are capable of appreciating their artists. Not everyone can afford to give a huge wad of cash, and it’s certainly true that some people who might save up the money to buy 10 CDs might not give that same amount in donations to the same group if they gave their music away for free – when you think about how many more people might be exposed to said music, it is easy to see how artists might make money, even more money than they currently do after the record labels take “their cut”.

Imagine if you could stream all the music you wanted for free, listen to it, then download any you liked to put on your MP3 player? Imagine the traffic that website might get – that’s some significant ad revenue, even from folks that don’t donate. Imagine if next to your favorite song was a link to a T-shirt with a popular lyric from the song on it – wouldn’t you consider buying that shirt, knowing it supports the artist, and you would have a cool shirt? Imagine that there was a small donate link at the bottom of the page which let you use your Amazon, Google Checkout, or Paypal account to donate money directly to the artist. 100% goes right to the artist, no middle man. Wouldn’t you consider giving $1 or $2 for a song you liked? $10 for an album that “touches you”?

Here’s another idea – take Transgaming’s Cedega as an example – the program is based on open-source software, but also has proprietary bits. What it does is not particularly important, the short version is it lets you play video games for windows on a Linux machine. In exchange for $5 a month, you can download Cedega and use it on as many computers as you like. If you stop paying, you stop getting upgrades. But the real reason to keep paying is to vote – each month you pay for you get one vote in each poll. Each month there are several polls. “Should we add support for this game?” “Should we work on better DirectX 9 support?” “Should we improve support for Steam games?” If you want, you can pay more than the minimum $5/month and get additional votes. People could just buy a minimum subscription (3 months), then cancel and not pay again until they find an unsupported game and need to download a newer version…but instead, they keep regular subscriptions so they can always vote on what they want improved. This is another fine example of something people could fairly easily “steal” (have one friend subscribe and download it for everyone), but most choose to support it – in part because of the empowering feeling of voting for features – just like voting for the features of the next song might do. Now imagine there was a form you could fill out when you donated to an artist, which let you “vote” on what sort of song or album you want the artist to make next. Maybe you prefer their acoustic work? Or perhaps their most recent collaboration with another artist was great, and you want a repeat. Nothing stops an artist from being creative and artistic, but this way, their fans can vote with their dollars and tell the artist which of their works they like best. Right now all artists get is feedback at the album/single level (except from a few online sales and other non-dollar-related polling methods). Many artists could benefit from this extra feedback (and income).

There is one final question I haven’t addressed – “why would authors bother creating music without the ‘huge fame big dollars’ of record label deals?”. Well, why wouldn’t they? I create my music for free. As we have already discussed, there are plenty of examples of artists creating as an outlet, for fun, with no financial motives (at least, initially), only for their creations to grow so large they must quit their jobs to focus on their art. Most importantly, would it be such a bad idea if money wasn’t encouraging artists to create? If artists were on their own, free of the record labels, you wouldn’t have artists releasing their “yearly album” of rehashed crap nobody wants. There would be no money in it anymore. Artists would try to create something people actually wanted to listen to – because things people don’t want to listen to wouldn’t make them any money anyways.

There have been some discussions of the relatively new concept of music as a disposable product. In my opinion, this practice is the last-ditch effort of a dying business model. If they can’t figure out a way to sell customers their “standard product”, and make it worth the price they want for it, they will overwhelm customers with other “rebuy options”. Pay for the CD. Then pay for the MP3. Then pay for the ringtone. Then pay for the “ringback”. Then pay for the “high definition audio CD”, or whatever technology some day replaces Audio CDs. Maybe I’m wrong – maybe customers love to buy everything three times. Hey, remember that movie you bought on VHS ten years ago? You totally want to buy that again on DVD, then again on HD-DVD, then again on Blu-Ray, then again on Amazon Unbox, right? I didn’t think so.

May 18, 2008

Hello, World!

Filed under: News — Carl Myers @ 9:36 pm

So, I’ve been saying I was going to put up a blog for over a year now. Congratulations to me for finally doing it. No, this does NOT mean I actually have free time now. =)

Powered by WordPress