Archive for the ‘language’ Category

Most common Persian words

Saturday, November 10th, 2007

As a second-generation Iranian American, who has spent practically no time in Iran, I have found it difficult to learn the Persian language beyond mere kitchen talk. In an effort to improve my vocabulary, I sought out a list of the most common Persian words. I could not find such a list, so I searched for a Persian-language corpus that I could use to produce the list myself.

I came across the Hamshahri Persian Corpus and decided to use it. I ran a word count on the corpus to determine what the most common words are in the Persian language. I posted the results sorted by the most frequently used words here.

The list was rather long, so I’ve only included words that appeared in the corpus over 1000 times. I plan to start at the top of the list and make flashcards out of any words I don’t know or am unsure of. This should help me focus on words that are more commonly used. I hope you find it useful as well. I will post the Java code I used to parse the corpus if anybody is interested.

If I ever find the time, my next goal is to try to find phrases, word combinations, and word patterns. If anybody is interested in helping out, please let me know. I’d also be interested in finding out about similar (non-commercial) efforts for other languages, particularly other indo-european languages or other languages that use an Arabic script.

How high is your overview?

Wednesday, May 30th, 2007

I’ve was going to write something today and I wanted to use the idiom “from a 20,000 foot view”, but I was questioning whether I got the number “20,000″ right. So, I did what any self-respecting geek would do, I Googled it. Well, as I expected, different people view things from different altitudes. I searched for two phrases, “from X feet” and “X foot view”, substituting X in for increments of 10,000. Here are the number of results Google said it had for the different numbers:

Feet “from X feet” “X foot view”
10,000 46,400 24,300
20,000 24,900 1,890
30,000 81,400 22,200
40,000 11,100 692
50,000 50,400 10,600
60,000 1,020 38
70,000 65 2
80,000 93 17
90,000 35 2
100,000 1,440 83

So it looks like people are generally looking at things from 30,000 feet (but keep in mind, my analysis is only based on a 10,000 foot view of the statistics).