An Obsession with Everything Else

http://www.derrickschneider.com/atom.xml

Tuesday, January 02, 2007

Menu For Hope Raffle Program

As some of you know, I donated development work to the latest Menu For Hope raffle. I wrote the program that chooses winners for each raffle prize. Choosing a random name from a list does not pose a difficult challenge, but interpreting the higgledy-piggledy comments does. Ticket purchasers left a comment specifying the prizes they wanted to buy tickets for, and you can imagine that the human-entered text was all over the map. I looked at the list as the raffle progressed, and wrote a program that would attack based on what I saw as the most common types of comments.



I knew I'd have to clean up some of the data—I tested the program by proofreading its entire output to compare my interpretation of a line to my program's—but I was happy to see that my code correctly parsed about 90 per cent of the comments. If I were writing this again (or modifying it next year), I might add some code to handle the case where a donater types "UW 01" instead of "UW01." That extra space accounted for roughly half of the cleanup work I needed to do. I deliberately kept my code "stupid" since clever code often takes longer to debug, and I still would have had to proofread all the output.



Here are some of the lines the program correctly parsed:

 UW01, UW10, UW36, UE03, WB04
$40 on red. er, i mean, UW39 :)
Two UW17, one UW48 please
1 tkt each AP09, AP10, AP25, AP31, AP36, AP40, 4 tickets AP02



But it couldn't interpret:

2 each on UE13 UE03 UE04 UC08 UC07



For transparency's sake, here's the Java code of the most interesting classes. We haven't done the official run yet, but I've cleaned all the data, and the program does what I expect. I've had the vicarious thrill—Melissa and I exempted ourselves from the raffle—of seeing various sets of "winners" in each test run. I only know what a few of the prize codes actually mean, but I've seen some good friends get fantastic prizes in some of the runs. I'll either send the program to Pim to run, or we will just pick a moment after which the output will become the official results (I found some lines in the data that I could not fix, and she is asking for clarification from the donaters).



Main program


import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;

/** Program for "drawing" raffle prizes for Menu for Hope. Not fancy, mostly
* just procedural.
* Attack strategy:
* Pull in lines from CSV.
* Parse each line to determine bidder, bid-on prizes, tix for each.
* Divide into pools.
* Draw randomly from each pool for winner.
*/

public class MFHRaffle {


public static void main(String[] args) {
// divide args into buckets
// there's only one input arg (file) so we can size list in advance
List commandArgs = new ArrayList(args.length - 1);
String filename = null;
for (int i = 0; i < args.length; i++) {
if (args[i].startsWith("-")) {
commandArgs.add(args[i].substring(1,args[i].length()));
} else {
filename = args[i];
}
}
System.out.println("Using file: " + filename);

if (commandArgs.contains("testrandomdraw")) {
testRandomDraw();
}

boolean debug = false;
if (commandArgs.contains("debug")) {
debug = true;
}

MFHDataParser parser = null;
if (commandArgs.contains("csv")) {
parser = new CSVDataParser();
} else {
parser = new ExcelDataParser();
}
parser.setDebug(debug);

Map<String,List<String>> entries = parser.extractData(filename);
Map<String,Integer> prizeCounts = new HashMap<String,Integer>();
Map<String,String> prizeToWinner = new HashMap<String,String>();

//produce a sorted list
List<String> sortedPrizes = new ArrayList<String>(entries.keySet());
Collections.sort(sortedPrizes);

System.out.println("*******************************");
// for every entry in map, throw list to randomDraw
for (Iterator<String> prizeIt = sortedPrizes.iterator();
prizeIt.hasNext();) {
// drumroll please...
String prize = prizeIt.next();
List<String> bidders= entries.get(prize);
prizeCounts.put(prize,new Integer(bidders.size() * 10));

String winnerEmail = randomDraw(bidders);
prizeToWinner.put(prize,winnerEmail);
}

// tab-delimited output for fatemeh
System.out.println("********** TEXT ****************");
for (Iterator<String> prizeIt = sortedPrizes.iterator();
prizeIt.hasNext();) {
String prize = prizeIt.next();
System.out.println(prize+"\t$"+prizeCounts.get(prize)+"\t"+
parser.getNameForEmail(prizeToWinner.get(prize)) +"\t"+
prizeToWinner.get(prize));
}

// html markup for brett
System.out.println("*********************************");
System.out.println("********** HTML *****************");
System.out.println("<table rules=\"rows\" >");
for (Iterator<String> prizeIt = sortedPrizes.iterator();
prizeIt.hasNext();) {
String prize = prizeIt.next();
System.out.println("<tr><td>"+prize+"</td><td>$"+
prizeCounts.get(prize) + "</td><td>" +
parser.getNameForEmail(prizeToWinner.get(prize)) +
"</td><td>" + prizeToWinner.get(prize) +
"</td></tr>");
}
System.out.println("</table>");


}

/** Simple method for proving that the random draw algorithm is sufficiently
* random.
* This test method creates a list, draws a random item from it 10000 times,
* and records the result.
*/
private static void testRandomDraw() {
List<String> items = new ArrayList();
for (int i = 0; i< 100; i++) {
items.add(String.valueOf(i));
}

Map<String,Integer> counts = new HashMap();

// 100-item list. retrieve a random item 10000 times, and note result
for (int i = 0; i < 10000; i++) {
String retrieved = randomDraw(items);
if (!counts.containsKey(retrieved)) {
counts.put(retrieved,new Integer(1));
} else {
Integer curCount = counts.get(retrieved);
counts.put(retrieved, new Integer(curCount.intValue() + 1));
}
}

// print counts
for (Iterator<String> itemIt = items.iterator();itemIt.hasNext();) {
String next = itemIt.next();
System.out.println("Item: " + next + " Count: " + counts.get(next));
}
}

/** Makes a random draw from a List of strings */
private static String randomDraw(List<String> items) {
// precision loss, but we're not expecting any numbers greater than
// MAX_INT
int i = (int)Math.floor(Math.random() * (double)items.size());
return items.get(i);
}
}



Data parser base class

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/** A class for parsing data files for MFH. This is an abstract class since
* it provides common functionality for subclasses. Each subclass can deal
* with a given file type (excel or csv)
*/
public abstract class MFHDataParser {

private boolean debug=false;

private char[] delims = {',',' ','.',';'};

private Map<String,String> emailToName = new HashMap<String,String>();

protected void mapEmailToName(String email, String name) {
this.emailToName.put(email, name);
}

public String getNameForEmail(String email) {
return this.emailToName.get(email);
}

public abstract Map<String,List<String>> extractData(String filename);

/** Do our darndest to figure out what prizes a donator has mentioned on
* a given line. Note: Be sure to complain if we don't recognize a prize.
*
*/
protected List<String> extractPrizes(String prizes, int amount)
throws NoPrizeFoundException {

String ucPrizes = prizes.toUpperCase().trim(); //for consistency's sake
if (isDebug()) {
System.out.println("Incoming prize string " + ucPrizes);
}

// basic strategy
// find two numbers followed by a delim (, space, ., eol, ;)
// back up and find two letters before it
// then back up (not into an earlier code) and find a #
// can't use java's regex abilities because i need to divide into larger chunks

// put that many copies into the list
// verify that size of list = donation /10. bark if not
// List = one of each real raffle ticket (5xUW03 -> 5 entries in List)

List<String> prizeList = new ArrayList<String>();
List<Integer> prizeCounts = new ArrayList<Integer>();

if (ucPrizes.length() < 4 ) {
throw new NoPrizeFoundException(ucPrizes);
} else if (ucPrizes.length() == 4) {
// exact count. easy case, but verify that it's legit

String prizeCode =
ucPrizes.substring(findPrizeCodeInTextBlock(ucPrizes),4);
for (int i =1;i<= amount/10; i++) {
prizeList.add(prizeCode);
}
} else {
// in this case we need to walk through the list, divided it into
// chunks, and find the prize code in each
int chunkStart = 0;
for (int i = 0; i < ucPrizes.length();i++) {
if (i == ucPrizes.length() - 1 ||
isDelim(ucPrizes.charAt(i))) {
String curChunk = null;
if (isDelim(ucPrizes.charAt(i))) {
curChunk = ucPrizes.substring(chunkStart,i);
}

if (i == ucPrizes.length() - 1) {
curChunk = ucPrizes.substring(chunkStart,i+1);
}
if (curChunk.length() < 4) {
continue;
}
int prizeCodeOffset = findPrizeCodeInTextBlock(curChunk);
if (prizeCodeOffset == -1) {
continue; // not in this text block
}

String prizeCode =
curChunk.substring(prizeCodeOffset,prizeCodeOffset + 4);
prizeList.add(prizeCode);
int prizeCount = new Integer(
findPrizeQuantityInTextBlock(curChunk,prizeCodeOffset));
if (prizeCount == -1) {
prizeCounts.add(new Integer(1));
} else {
prizeCounts.add(new Integer(prizeCount));
}

chunkStart = i + 1;
}
}
}

// expand prize list as needed
if (prizeList.size() == amount /10) {
return prizeList; // if there are as many prizes as the amount
// would suggest, do one ticket each
} else if (prizeList.size() == 1) {
// create an expanded list that has one entry for each ticket
List<String> newPrizeList = new ArrayList<String>(amount/10);
for (int i = 0;i < (amount /10); i++) {
newPrizeList.add(prizeList.get(0));
}
return newPrizeList;
} else {
// we have a mix of amounts and quantities
List<String> newPrizeList = new ArrayList<String>(amount/10);
for (int i = 0; i < prizeList.size();i++) {
Integer count = prizeCounts.get(i);
for (int j = 0; j < count.intValue(); j++) {
newPrizeList.add(prizeList.get(i));
}
}
return newPrizeList;
}
}

protected int parseAmount(String amtString) {
return (int)(Double.parseDouble(amtString));
}

/** Takes a guess at the prize quantity in a given text block */
private int findPrizeQuantityInTextBlock(String chunk, int prizeOffset) {
// walk over the string looking for numbers, skipping the prize code
for (int i = 0; i < chunk.length(); i++) {
if (i >= prizeOffset && i < prizeOffset + 4) {
continue;
}

if (Character.isDigit(chunk.charAt(i))) {
if (i < chunk.length() - 1 &&
Character.isDigit(chunk.charAt(i+1))) {
// two-digit quantity
char[] digits = {chunk.charAt(i),chunk.charAt(i+1)};
return Integer.parseInt(new String(digits));
} else {
return Integer.parseInt(chunk.substring(i,i+1));
}
}
}

// one last check. some bidders wrote "TWO" instead of 2
if (chunk.indexOf("TWO") != -1) {
return 2;
}

return -1;
}

/** Returns the offset of something that looks like a prize code. */
private int findPrizeCodeInTextBlock(String chunk)
throws NoPrizeFoundException {

// look for 2 letters followed by 2 numbers => prize code
for (int i = 3; i < chunk.length(); i++) {
if (Character.isDigit(chunk.charAt(i)) &&
Character.isDigit(chunk.charAt(i - 1)) &&
Character.isLetter(chunk.charAt(i - 2)) &&
Character.isLetter(chunk.charAt(i - 3)) ) {
return i - 3;
}
}
return -1;
}

protected boolean isDebug() {
return this.debug;
}

public void setDebug(boolean debug) {
this.debug = debug;
}

private boolean isDelim(char c) {
for (int i = 0; i < this.delims.length; i++) {
if (c == delims[i]) {
return true;
}
}
return false;
}

}



Excel parser

import java.io.File;

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;

import jxl.*;

/** Subclass of MFHDataParser that knows how to parse excel docs.
*/

public class ExcelDataParser extends MFHDataParser {

public Map<String,List<String>> extractData(String filename) {
Map<String,List<String>> retVal = new HashMap<String,List<String>>();
try {
Workbook wkbk = Workbook.getWorkbook(new File(filename));
Sheet sheet = wkbk.getSheet(0);
for (int i = 0; i < sheet.getRows(); i++) {
if (i == 0) { continue; }// skip headers
String name = sheet.getCell(0,i).getContents();
String email = sheet.getCell(1,i).getContents();
String date = sheet.getCell(2,i).getContents();
if (isDebug()) {
System.out.println("amount: " + sheet.getCell(3,i).getContents());
}
int amt = parseAmount(sheet.getCell(3,i).getContents());
String comment = sheet.getCell(4,i).getContents();
if (email == null || email.trim().equals("")) {
throw new IllegalArgumentException("No email found");
}
mapEmailToName(email,name);


if (comment == null || comment.trim().length() == 0) {
System.out.println("No comment on line " + (i+1));
System.out.println("");
continue;
}
List<String> prizes = extractPrizes(comment,amt);
if (isDebug()) {
System.out.print( "prizes for line " + (i+1) + " ");
for (Iterator<String> prizeIt = prizes.iterator();
prizeIt.hasNext();) {
System.out.print(prizeIt.next() + " " );
}
System.out.print("\n");
}

if (prizes.size() != amt / 10) {
System.out.println("Line " + (i+1) +
" does not have the right number of prizes for the amt");
System.out.println("");
continue;
}

// compress the lists down to MFHPair, which includes an email
// and a count. Insert into map, keyed by prize code
Collections.sort(prizes); // make sure they're in order
String curPrize = "";
int curCount = 0;
for (Iterator<String> prizeIt = prizes.iterator();
prizeIt.hasNext();) {
String prizeFromList = prizeIt.next();
List<String> bidders = null;
if (retVal.containsKey(prizeFromList)) {
bidders = retVal.get(prizeFromList);
} else {
bidders = new ArrayList<String>();
retVal.put(prizeFromList,bidders);
}
bidders.add(email);
}
System.out.println("");
}
return retVal;
}
catch (Exception e) {
System.err.println("There was a problem: " + e);
e.printStackTrace();
System.exit(1);
}
return null;
}
}

6 Comments:

At 4:36 PM, Blogger Kalyn Denny said...

Very interesting (not that I understand much of it, LOL.) You are great to do this for us and even better for posting it so people can feel certain that it's fair. Thanks.

 
At 11:45 PM, Anonymous Anonymous said...

woah.

"mama, he done be speakin' in tongues a'gin"

 
At 8:47 AM, Blogger Jennifer Maiser said...

I love that you posted this. Of course, I need a little more coffee before I can even pretend to understand it. I'm actually in awe that you were able to use the script to parse those non-standard statements. I would have made everyone write things in the *exact* same way to qualify. But I'm a control freak. And obviously nowhere as good as you with any sort of code.

 
At 1:25 PM, Blogger Derrick said...

Thanks, all. In truth, the code is pretty simple and works off some simple assumptions which I inferred from eyeballing the data. I knew I'd have to clean stuff up and, except for the UW 01 instead of UW01, problem, none of the fixes I made would have been worth the time I'd put into making the program consider them.

If I did one clever thing, I'd probably prioritize the delimeters. In other words, if the commenter includes a , or a ;, I can assume that spaces aren't delimiters. I probably would've caught more of the UW01 x 2 (or whatever) chunks in the spreadsheet.

But again, it took me about 2-3 hours to clean up the data, so no big.

 
At 3:58 PM, Blogger Chai said...

*seeing as I won nothing* RIGGED!

 
At 4:09 PM, Anonymous Anonymous said...

OwefeAccofe
Bd5f

 

Post a Comment

<< Home