Generate letters (or other random values) with given frequencies in C#

This example generates random letters with the frequencies they show in normal English text.

The following code shows how the example gets started.

// The letter frequencies. See:
// en.wikipedia.org/wiki/Letter_frequency
private float[] Frequencies =
{
    8.167f, 1.492f, 2.782f, 4.253f, 12.702f, 
    2.228f, 2.015f, 6.094f, 6.966f, 0.153f, 
    0.772f, 4.025f, 2.406f, 6.749f, 7.507f, 
    1.929f, 0.095f, 5.987f, 6.327f, 9.056f, 
    2.758f, 0.978f, 2.360f, 0.150f, 1.974f, 
    0.074f
};

// Random number generator.
private Random Rand = new Random();

// The ASCII value of A.
private int int_A = (int)'A';

// Make sure the frequencies add up to 100.
private void Form1_Load(object sender, EventArgs e)
{
    // Give any difference to E.
    float total = Frequencies.Sum();
    float diff = 100f - total;
    Frequencies[(int)'E' - int_A] += diff;
}

The Frequencies array holds the letters' relative percentage frequencies as listed at Wikipedia. This example treats these numbers as percentages so, for example, the letter A should appear roughly 8.167% of the time. If these numbers are some other value such as fractions between 0 and 1, you can adjust the program accordingly. If they are counts, for example the number of occurrences in a particular piece of text, then you can add them up to convert them into percentages.

The code creates a Random object and defines the value of the letter A as an integer for later convenience.

The form's Load event handler adds up all of the values in the Frequencies. In this example, the frequencies don't add up to 100% because they didn't on the Wikipedia page where I got them. To make the total add up to 100%, the code adds them up, subtracts the total from 100, and adds the difference to the letter E, increasing or decreasing the frequency of E slightly so the total is 100%.

When you click the Generate button, the following code executes.

// Generate random letters with the indicated frequencies.
private void btnGenerate_Click(object sender, EventArgs e)
{
    // Keep track of the number of each letter generated.
    int[] counts = new int[26];

    // Generate the letters.
    int num_letters = int.Parse(txtNumLetters.Text);
    string result = "";
    for (int i = 0; i < num_letters; i++)
    {
        // Generate a number between 0 and 100.
        double num = 100.0 * Rand.NextDouble();

        // See which letter this represents.
        for (int letter_num = 0; ; letter_num++)
        {
            // Subtract this letter's frequency from num.
            num -= Frequencies[letter_num];

            // If num <= 0, then this is the letter.
            if ((num <= 0) || (letter_num == 25))
            {
                char ch = (char)(int_A + letter_num);
                result += ch.ToString() +' ';
                counts[letter_num]++;
                break;
            }
        }
    }

    txtLetters.Text = result;
    txtLetters.Select(0, 0);

    // Display the frequencies.
    lstFrequencies.Items.Clear();
    for (int i = 0; i < counts.Length; i++)
    {
        char ch = (char)(int_A + i);
        float frequency = (float)counts[i] / num_letters * 100;
        string str = string.Format("{0}\t{1,6}\t{2,6}\t{3,6}",
            ch.ToString(),
            frequency.ToString("0.000"),
            Frequencies[i].ToString("0.000"),
            (frequency - Frequencies[i]).ToString("0.000"));
        lstFrequencies.Items.Add(str);
    }
}

For each of the letters it should generate, the program picks a random number between 0 (inclusive) and 100 (exclusive). It then loops over the values in the Frequencies array, subtracting them from the random value. When the random value reaches 0, the program uses the letter corresponding to the most recently subtracted frequency. Assuming Random generates numbers with reasonably uniform distribution (and it's fairly good at that), this means each letter is picked with roughly the same frequency as that given in the Frequencies array.

   

 

What did you think of this article?




Trackbacks
  • No trackbacks exist for this post.
Comments
  • No comments exist for this post.
Leave a comment

Submitted comments are subject to moderation before being displayed.

 Name

 Email (will not be published)

 Website

Your comment is 0 characters limited to 3000 characters.