Home » design, education

Most common Persian words

10 November 2007 152 Views 3 Comments

As a second-generation Iranian American, who has spent practically no time in Iran, I have found it difficult to learn the Persian language beyond mere kitchen talk. In an effort to improve my vocabulary, I sought out a list of the most common Persian words. I could not find such a list, so I searched for a Persian-language corpus that I could use to produce the list myself.

I came across the Hamshahri Persian Corpus and decided to use it. I ran a word count on the corpus to determine what the most common words are in the Persian language. I posted the results sorted by the most frequently used words here.

The list was rather long, so I’ve only included words that appeared in the corpus over 1000 times. I plan to start at the top of the list and make flashcards out of any words I don’t know or am unsure of. This should help me focus on words that are more commonly used. I hope you find it useful as well. I will post the Java code I used to parse the corpus if anybody is interested.

If I ever find the time, my next goal is to try to find phrases, word combinations, and word patterns. If anybody is interested in helping out, please let me know. I’d also be interested in finding out about similar (non-commercial) efforts for other languages, particularly other indo-european languages or other languages that use an Arabic script.

Technorati Tags: , ,

3 Comments »

  • Shahram said:

    I came across your blog from on of your article in DevX.com.
    I work as Java/J2EE developer in Canada.
    It was exciting to find an Iranian with java expertise and many articles and book.
    I just want to say hello wish you best of luck.

  • Ansa said:

    Hey! Thanks for the list of common Parsi words. Exactly what I was looking for to help me learn!

  • Martin Roberts said:

    Dear Javid,

    I was most impressed when I was searching for a list indicating the word frequency of Persian words, that I found your list.

    I am starting a course to teach both children and adults and will be using your list as a basis for teaching and learning Persian.
    (I have successfully used similar lists to teach English.)

    One of the things that I have done with your list is use it as the basis for a automated vocabulary assessment.

    This program intelligently selects typically 20 words, and based on which of these words the student knows, the program estimates the size of the user’s vocabularly.
    (Where there is some hefty maths involved to determine how to most efficiently select which words are asked to the student)

    This program can then be run periodically to determine how well the student is learning…

    For beginners, a list of 5000 words is more than sufficient for this purpose, however, for more skilled (but not fluent) students the list needs to be significantly larger, to cater for the rarer words.

    My technical skills are more in the line of maths, and teaching — with only a modest set of computer programming skills, but it seems that you are far more proficient at databasing than me. Is it possible to re-run your program (if you still have it) to produce, say the top 20,000 words?

    Kindest regards,

    Martin Roberts
    Tasmania, Australia.

Leave your response!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.