How I cheated at Wordament, and won

All code for this article can be found here.

You can see the bot in action here.

What is Wordament, and why cheat?

Wordament is a word-searching game for Android, iPhone and Windows Phone (hah). The game displays a four by four grid of letters. The player’s job is to swipe through the letters to find words. Points are awarded for every word found, based on complexity. Every player plays the same board simultaneously and the game ranks players after each 90-second round.

I was always pretty bad at this game, making it to the top 35% on a good day. While playing I noticed that the players at the top of the scoreboard were either extremely good, or not legit. Many have scores and word counts seemingly too high to be achieved by a human being in 120 seconds. So I set out to beat these top players by writing a bot.

Why bother? I don’t care about getting the best score on Wordament, it just seemed like an interesting technical challenge.

Approach

Because I wanted to use this challenge to play with fun technologies my approach was a little convoluted. I wrote a web service that runs on the Microsoft Azure Websites platform (free tier). This service takes a Wordament board as input and gives all possible solutions to the board as output.

I then wrote an Android UI automation test. This test takes a screenshot of the board and finds letters on it. It sends the letters up to my Azure service to get the possible solutions. It then uses UI automation to swipe these solutions into the game board.

I call these components the ‘solver’, the ‘reader’ and the ‘player’.

The solver

Here we need a component that can find every solution given a 16-letter Wordament board. A ‘solution’ in this context is a word plus the tile positions on the board that it will need to swipe to input that word. For example if I give it the board

    LHAS
    DLAM
    INON
    CTAG

It might output something like

[
  {
    "word": "MONTANAS",
    "positions": [
      7,
      10,
      9,
      13,
      14,
      11,
      6,
      3
    ]
  },
  {
    "word": "AMANITA",
    "positions": [
      2,
      7,
      6,
      9,
      8,
      13,
      14
    ]
  },
  ... //etc

I would leave it up to the device-specific code to transform these board coordinates into screen coordinates.

Creating a dictionary

First we need a list of possible words. For this I Google’d ‘word list’ and ‘dictionary list’ and found a bunch of options to download. Towards the end of this project I found that every individual list I found had two deficiencies. None of them had the complete set of words for any given board, and they all had lots of words that you wouldn’t expect to find on a Wordament board.

I solved this by creating a quick program. It takes a directory of word lists and returns all the words which appear in at least three lists. This way every word in my final set is common enough to be found in many dictionaries (high likelihood of being a possible Wordament word). Also any common word missing from one word list is likely to be found in the others.

The data structure

Here’s how I wanted to search a Wordament board for a given word. For each letter on the board, find out if there are any words that start with it and each of the neighbouring letters. If there are, for each neighbouring letter, if any words in the current list finish with this letter then add the solution to the result. Repeat for every connected letter.

Here’s a pseudocode algorithm

foreach letter in board:
  do_search(letter, [])

do_search(current_word, result_set):
  if words.contains(current_word):
    result_set = result_set.add(current_word)

  foreach letter in current_word.neighbours:
    if words.with_prefix(current_word + letter):
      do_search(current_word + letter, result_set)

It happens that there’s a data structure purpose built to perform these kinds of prefix searches. It’s called the prefix tree, or trie (pronounced ‘tray’). I’ve written before about the prefix tree and how I implemented mine in F#. You can find the code here or just grab the binaries from Nuget. I won’t go into details about the prefix tree here. Suffice it to say it exposes exactly the interface I need to perform this prefix search. We’ll see this interface in action in this article.

Searching for solutions

To solve the board with the given 16 letters (see the reader section below for how we get them) we just need to pass them through the search algorithm. We also need to pass our in trie which has been pre-loaded with the word list, described above. Here’s the actual F# code.

// this is our Board module, with some types for the model
[<RequireQualifiedAccess>]
module Board

let cols = 4
let rows = 4
let cells = rows * cols

type Solution(word, positions) =
    member this.word = word
    member this.positions = positions

type Letter(index, character, neighbours) =
    member this.character = character
    member this.neighbours = neighbours
    member this.index = index

let generateNeighbours board currentIndex =
    seq { 
        if currentIndex > cols then
            if currentIndex % cols > 0 then yield currentIndex-(cols+1)
            yield currentIndex-cols
            if currentIndex % cols < (cols-1) then yield currentIndex-(cols-1)
        if currentIndex % cols > 0 then yield currentIndex-1
        if currentIndex % cols < (cols-1) then yield currentIndex+1
        if currentIndex < cells-cols then
            if currentIndex % cols > 0 then yield currentIndex+(cols-1)
            yield currentIndex + cols;
            if currentIndex % cols < (cols-1) then yield currentIndex+(cols+1)
    }

type Board(letters:string) =  
    let charList = List.ofArray(letters.ToCharArray())
    member this.letterGraph = 
        charList
        |> List.mapi (fun index character -> 
            new Letter(index, character, (List.ofSeq (generateNeighbours letters index))))

let depthFirstSearch (board:Board) (shortCircuitFunction:string -> bool) (selectorFunction:string -> bool) =
    let rec search (board:Board) (currNode:Letter) currString path searchedList =
        seq {
            let newString = currString + currNode.character.ToString()
            if not(shortCircuitFunction newString) then
                if selectorFunction newString then
                    yield Solution(newString, List.rev(path))
                for neighbourIndex in currNode.neighbours do
                    if not(path |> List.contains neighbourIndex) then 
                        let nextLetter = board.letterGraph.[neighbourIndex]
                        yield! search board nextLetter newString (nextLetter.index :: path) (currNode.index :: searchedList)
        }
    seq {
        for letter in board.letterGraph do
            yield! search board letter "" [letter.index] []
    }

// and the algorithm
let solve boardLetters trie =
    let board = new Board.Board(boardLetters)
    let solutions = Board.depthFirstSearch board (fun s -> not(Trie.isPrefix(trie, s))) (fun s -> Trie.isWord(trie, s))
                   |> Seq.filter(fun s -> s.word.Length > 2)
                   |> Seq.distinctBy(fun s -> s.word)
                   |> Seq.sortBy(fun s -> s.word)
                   |> Seq.sortByDescending(fun s -> s.word.Length)
    solutions

This is worth explaining.

We define a couple of types. Solution, which is just a word and a list of integers giving the positions of the letters. Letter, which is a node in the board which points to its neighbours (so we can represent the board as a graph). And Board, which takes a string representing the 16 letters and hooks them all together based on their locations on the grid (this is the generateNeighbours function).

Next we define a depth first search. This is a normal DFS algorithm, with the added feature that it takes a “short circuit” function. If this function returns true at any point during the search then the search will halt. This is so that when we search we can tell it to stop if the current word is not a valid prefix for any word in our trie.

The result of this search function is the list of all solutions, including the path required for each one.

The web service

I wrote a web service to host my solver code. It’s a simple ASP.NET MVC service with three controllers.

The first is named ReaderController. This is capable of taking a bitmap image (a screenshot of a Wordament board) from the body of a request and returns the letters on the board as text (more on the Reader later). I mostly used this for testing and debugging - it’s not used in the final application.

The next controller is called SolutionController. It takes 16 letters (the characters on a Wordament board). It sends them to my solver and returns the solutions for the board. Like ReaderController this was mostly for testing and debugging.

And the third controller is the poorly-named SolverController. This guy combines the behaviours of the previous two controllers. It reads a Wordament screenshot from the request and sends the letters as text to my solver. It then returns the solutions to the caller. This one was used by my Android program to find the solutions from a Wordament screenshot.

The reader

Getting the letters from a board took a bit of effort. Presumably to prevent cheating, Microsoft doesn’t put any kind of identifying metadata onto the objects that make up a board. My only option was to analyse a bitmap of the phone screen with the board on it.

I needed a simple OCR algorithm that would allow me to identify letters from bitmaps. After playing with some pretty crappy web services I decided to roll my own. My implementation is simple and inflexible, but it works well on Wordament boards. All the code we’ll discuss here is available in Reader.fs in my Bitbucket repository. This is written in F# but would be easy to port to any language.

First I needed to cut up a Wordament board so that I had individual bitmaps of each of the 16 letters. So I needed to search the bitmap for the orange letter tiles. I start by defining the start and end rows that I wanted to perform the search over. Here I just hard coded ratios of the screen size. I always used ratios and not absolute pixel values because I was testing this app on many devices with different resolutions.

    let findSquares (bitmap:System.Drawing.Bitmap) =
        let startRow = (int)(System.Math.Round((float bitmap.Height) * 0.14));
        let endRow = (int)(System.Math.Round((float bitmap.Height) * 0.7));
        getSquares bitmap startRow endRow

The idea here is that I wanted to divide up the board so that the only orange pixels inside the boundaries are letter tiles. This way I can use that distinctive orange colour to easily divide the letter tiles up. This divided up a Wordament board like this:

Board preparation, part 1.

Next is the getSquares function. Here I iterate through each row of pixels, looking for the first orange pixel on the row. When I find an orange pixel, I iterate across it and then down it until I run out of orange pixels. Now these boundaries make up a bounding rectangle for this tile. I then continue my search for the next tile by starting from the upper right of this rectangle and repeat. When I run out of room on this pixel row, I jump down a full tile’s height and start searching again from this row. Here’s the full code.

    let isOrange (color:Color) =
        color.R > (byte)220 
        && color.G > (byte)130 
        && color.G < (byte)170
        && color.B < (byte)29

        
    let getBounds (bitmap:Bitmap) col row =
        let colUpper = seq { col .. 1 .. bitmap.Width-1 }
                        |> Seq.find (fun p -> not(isOrange (bitmap.GetPixel(p, row))))
        let rowUpper = seq { row .. 1 .. bitmap.Height-1 }
                        |> Seq.find (fun p -> not(isOrange (bitmap.GetPixel(col, p))))
        new Rectangle(col, row, colUpper-col, rowUpper-row)
        
    let getSquares (bitmap:Bitmap) startRow endRow = 
        let mutable row = startRow
        let mutable col = 0
        let mutable rectHeight = 1
        let mutable result = []

        while row < endRow do
            while col < bitmap.Width do
                let color = bitmap.GetPixel(col, row)
                if isOrange color then
                    let rect = getBounds bitmap col row
                    col <- col + rect.Width
                    rectHeight <- rect.Height
                    result <- rect :: result
                else
                    col <- col + 1
            col <- 0
            row <- row + rectHeight
            rectHeight <- 1

        result |> List.rev

There are probably less stupid ways of finding bounding boxes but this works well ¯\(ツ)/¯ This divides the board into 16 square bitmaps, like so:

Board preparation, part 2.

To match each bitmap to a letter, I divided the bitmaps into quarters and assigned each quadrant a value based on the ratio of white to orange pixels. For example, here’s the letter ‘G’, with its so-called ‘letter vector’:

Board preparation, part 3.

How did I get these letter vector values? By painstakingly analysing a ton of screenshots of Wordament boards. In fact, if you look at my code you’ll see I’m actually missing Q and X. these letters appear so infrequently in Wordament that I wasn’t able to snap a screenshot to analyse them.

You may also notice I do not assign a value to the upper left quadrant. After implementing and testing this whole system, I found some letters were not matching when they should have been. I discovered that the digit in the upper left of the letter tile, the score, changes value on certain boards. This happens in special rounds, like where letters in corners are more valuable. This was enough to throw off my simple algorithm. I could have made my algorithm more permissive, but I found that the three other quadrants were enough to uniquely identify a letter. So I just threw that upper-right quadrant away.

So now I have a bitmap from the letter tile and I have a pre-defined vector to letter map. I can break my bitmap tile into quadrants. I can transform my quadrants into a vector of ratios of white to orange. And I can find the vector in my map that most closely matches the vector from my bitmaps. Here’s the code for doing this:

    let vDist (a:float,b,c) (aa,bb,cc) =
        sqrt ((a-aa)*(a-aa) +
              (b-bb)*(b-bb) +
              (c-cc)*(c-cc))

    let matchLetterVector (a,b,c) =
        letters |> List.minBy(fun v -> 
            let (aa, bb, cc, _) = v
            vDist (a,b,c) (aa,bb,cc))

    
    let toLetter (bitmap:Bitmap) (rect:Rectangle) =
        let w = rect.Width / 2;
        let h = rect.Height / 2;
        let size = new Size(w, h);
        let quads = seq {
            yield bitmap.Clone(new Rectangle(new Point(rect.X, rect.Y), size), bitmap.PixelFormat)
            yield bitmap.Clone(new Rectangle(new Point(rect.X + size.Width, rect.Y), size), bitmap.PixelFormat)
            yield bitmap.Clone(new Rectangle(new Point(rect.X, rect.Y + size.Height), size), bitmap.PixelFormat)
            yield bitmap.Clone(new Rectangle(new Point(rect.X + size.Width, rect.Y + size.Height), size), bitmap.PixelFormat)
        }
        let list = quads |> Seq.map (fun q -> whitePixelCount q) |> List.ofSeq
        let vector = (list.[1], list.[2], list.[3])
        let (a,b,c,letter) = matchLetterVector vector
        if (vDist (a,b,c) vector) > 0.02 then "*"
        else letter.ToString();

Now we just combine these two functions to break down the board into 16 bitmap letter tiles, and pass these tiles through the toLetter function. We wind up with 16 letters and return them. We can pass these letters into the solver defined above and out pops the solutions, ready to be input into the game.

The player

Not surprisingly the player was the most finicky and least fun part of this project, but it’s fairly quick to describe.

I created a simple do-nothing Android project with a single test class. The test class takes a screenshot of the game board and sends it up to my web service, which sends it to the reader and solver. When the test receives the solutions it inputs them into the application.

Screenshotting the board

Screenshotting a board in an Android test is straight-forward.

            String dir = Environment
                    .getExternalStorageDirectory()
                    .getAbsolutePath();

            String filename = "file.png";
            File file = new File(dir + File.separator + filename);
            Log.d("Wordament.bot", "Screenshotting board");
            boolean screenshotResult = device.takeScreenshot(file, 0.1f, 20);

            if (!screenshotResult) {
                Log.d("Wordament.bot", "Failed to take screenshot");
            }

Once we’ve taken the screenshot it needs to be prepared and sent to the web service. The important part here is how we compress the screenshot. Remember how we wrote quite a brittle reader? We need to make sure that the orange to white ratios in the image match our expectations. This means using a compression method that preserves colors and edges. JPEG is definitely out, but PNG is a good fit.

I also decided to scale the image down by a factor of two on each side. This preserves the ratios of white to orange but reduces the size of the image by a factor of four, which significantly improves performance (especially if you’re using this on a high-resolution device like a Pixel phone). It could probably scale perfectly fine to much smaller sizes but this was enough for me.

I used pretty standard JSON handling to manage the request and response.

    @NonNull
    private Solution[] getSolutionsFromWeb(File file) {

        try {
            String dir = Environment
                    .getExternalStorageDirectory()
                    .getAbsolutePath();

            File compressedFile = new File(dir + File.separator + "file2.png");

            Bitmap capturedImage = BitmapFactory.decodeFile(file.getPath());
            Bitmap bitmap = Bitmap.createScaledBitmap(capturedImage, capturedImage.getWidth()/4, capturedImage.getHeight()/4, true);
            FileOutputStream fos = new FileOutputStream(compressedFile);
            bitmap.compress(Bitmap.CompressFormat.PNG, 20, fos);
            fos.flush();
            fos.close();
            byte[] bytes = IOUtils.toByteArray(new FileInputStream(file));
            OkHttpClient client = new OkHttpClient();
            Request request = new Request.Builder()
                    .url("http://wordamentbot.azurewebsites.net/solver")
                    .method("POST", RequestBody.create(
                            okhttp3.MediaType.parse("image/png"),
                            bytes))
                    .build();

            Response response = client.newCall(request).execute();
            Gson gson = new Gson();
            return gson.fromJson(response.body().string(), Solution[].class);
        } catch (IOException e) {
            Log.d("Wordament.bot", "Problem contacting OCR service\r\n" + e.getMessage());
            return new Solution[0];
        }
    }

Now that we have the solutions we need to figure out how to input them. The first step is figuring out where is the game board on the screen and where are the letter tiles. Here are a couple of utility methods for achieving this. This uses standard Android UI automation test tools for querying and interacting with the Android object model.

    @NonNull
    private List<Point> getTileCoordinates(UiObject2 board) {
        List<UiObject2> tiles = board.getChildren();
        List<Point> centers = new ArrayList<>();
        for (int i = 0; i < tiles.size(); i++) {
            centers.add(tiles.get(i).getVisibleCenter());
        }
        return centers;
    }

    @NonNull
    private UiObject2 getGameBoard() {
        UiObject2 boardView = null;
        Log.d("Wordament.bot", "Waiting for device");
        do {
            device = UiDevice.getInstance(InstrumentationRegistry.getInstrumentation());
            try {
                List<UiObject2> objects = device.findObjects(By.clazz("android.widget.RelativeLayout"));
                for (UiObject2 view : objects) {

                    // the game board is a view with exactly 16 child views (the letter tiles)
                    // and takes up at least two thirds of the screen (so it's not the half-screen
                    // game board you see between rounds).
                    if (view.getChildCount() == 16 &&
                        view.getVisibleBounds().width() > ((float)device.getDisplayWidth())/1.5f) {
                        Log.d("Wordament.bot", "Found view with 16 subviews");
                        boardView = view;
                        break;
                    }
                }
            } catch (StaleObjectException e) {
                Log.d("Wordament.bot", "Caught StaleObjectException, probably changing screens");
            }
        } while (boardView == null); // if the board is not visible, go back and try again
        return boardView;
    }

The result of these methods is the coordinates of the UIObject2’s that represent the letters. So now we know which letters to swipe and where they are, it’s a matter of performing the swiping with automated testing tools:

                for (Solution solution: solutions) {
                    solution.solve(device, centers);
                    Log.d("Wordament.bot", "Swiped + " + solution.word);
                }
    class Solution {
        public String word;
        public int[] positions;

        // swipe this solution onto the given device with supplied letter tile coordinates
        protected void solve(UiDevice device, List<Point> tileCoords) {
            Point[] coords = new Point[this.positions.length];
            for (int i = 0; i < this.positions.length; i++) {
                int index = this.positions[i];
                coords[i] = tileCoords.get(index);
            }
            device.swipe(coords, coords.length);
        }
    }

The Android test framework supplies very simple API for us to swipe through the UI so there’s little work for us to do here.

The result, and some problems

So here’s a quick video of the result. The bot will get us to the top of the leaderboard on any game it’s capable of solving well.

The limitations

I didn’t go all the way with this project. There are certain boards that this bot won’t solve. There’s an example of such a board to the right.

You can see that this board has a letter combination. Ammending the bot to handle letter combos like this would be straight-forward, except for generating the letter vectors to identify every possible combination. Either a LOT of painstaking work or a smarter OCR system wouwld be required.

Another limitation is the special games that Wordament presents - some of my favourite games. These include games like “E in the corners” where every corner will have an ‘E’ and these tiles are worth much more points. Now, in most cases this bot will do well in such games as a result of being exhaustive, but as discussed in the word list section our dictionaries are not in fact completely exhaustive. If we are missing a significant number of ‘E’ results in our word list then we may miss out on too many points to win the game.

The good news is that in either of these cases the bot will still perform. Letters that are not recognised are returned as “*”, a letter that never occurs. So although that tile will never be used, the bot will continue to work around it.

These are not problems I intend to solve, as this bot was just a muck-around project for me. If you update the bot yourslef I’d love to see how, and am keen to hear any other feedback you have to offer. Thanks for reading!