Skip to content

Portal

gogo_som

Georg Gogo BERNHARD on Using self organizing maps for categorizing arbitrary text.

'Using Self Organizing Maps to categorize arbitrary text'

1. Concept of Self Organizing Maps

1.1. What is a SOM?

Analysis of similarities, grouping similar things.

Clustering
Classification
Topology
Dimension Reduction

Artificial Intelligence

Not a neuronal network, no teaching input needed for training.

1.2. Example use cases

Protein 3d structure
HIV classification


2. Document Vectors

2.1. What is similarity of texts

Word Distribution

2.2. Code example

Count how often a word is used

>>> f = open('wizoz10.txt','r')
>>> s = f.read()
>>> import dvg
>>> d = dvg.dvg(128)
>>> t = d.simplifyASCII(s)
>>> s[:100]
'L. Frank Baum\r\n\r\nChicago, April, 1900.\r\n\r\n\r\n\r\n THE WONDERFUL WIZARD OF OZ\r\n\r\n\r\n '
>>> t[:100]
'L FRANK BAUM CHICAGO APRIL THE WONDERFUL WIZARD OF OZ THE CYCLONE DOROTHY LIVED IN THE MIDST OF TH'
>>> w = t.split()
>>> l = {}
>>> for i in w :
... try :
... l[i] = l[i] + 1
... except KeyError :
... l[i] = 1
...
>>> l['DOROTHY']
346
>>> l['TOTO']
90
>>> l['THE']
2953
>>>


2.3. Dimension Reduction

2.3.1. High Dimension

Almost infinite number of words per language

Almost infinite number of languages

Sparse Vector

2.3.2. Low Dimension

(Constant) Random Matrix Multiplication


3. Kohonen Maps

Teuvo Kohonen (born July 11, 1934)
Finnish academican

neighbourhood of the cells
eigenvector
winner

Elastic grid

3.1. Examples

3.1.1. 2d Grid unfolding

3.1.1.1. Square unfolding

3.1.1.2. Triangle unfolding

3.1.1.3. Topological defect

3.2. Code example

map.Initialize()
while True :
sample = random.choice(samples)
winner = map.getWinner(sample)
map.scaleNeighbors()


3. Live Demo

3.1. Gutenberg Texts

Texts swim on the map
Similar texts are grouped together
Different Language drift apart

4. Conclusion

4.1. Problems

4.1.1. Problems with large amounts of text

4.1.2. Problem with definition of similarity for text

Two level decument vector generation

First words per sentence, then sentences per text.

4.2. References

4.2.1. WEBSOM

4.2.2. Download link for GoSOM2 Plone Conference 2005 edition

http://gogo.bluedynamics.net/files/GoSOM2-ploneconf2005.tgz/download

5. Thanks

Posted by gogo on 2005-09-19 17:52

Trackback

The URI to TrackBack this entry is: http://gogo.bluedynamics.net/plone/blogs/pc2005/blog/archive/2005/09/19/gogo_som/trackback
  • penis pills
    health pills
  • 57287d9368bef01762c2d797150e7be0: 57287d9368bef01762c2
    57287d9368bef01762c2d797150e7be0<a href=\"http://57287d9368be.info\">57287d9368be</a>
  • JPET: JPET
    While browsing for content to write about, I discovered an intresting post on "" which I found relevent to blog readers here at <a href="http://jpet.aspetjournals.org/cgi/search?andorexactfulltext=and&resourcetype=1&disp_type=&sortspec=relevance&author1=&fulltext=%3Ca+href%3D%22http%3A%2F%2Fwww.doctorstrust.com%2Ffind%2Fpolicosanol.html%22%3Epolicosanol%3C%2Fa%3E&amp;amp;amp;pubdate_year=&volume=&firstpage=">JPET</a> and I recommend that everyone drop by and read that post...
  • About.com: About.com
    While browsing for content to write about, I discovered an intresting post on "" which I found relevent to blog readers here at <a href="http://heartdisease.about.com/sitesearch.htm?terms=%3Ca%20href=%22http://www.doctorstrust.com/find/policosanol.html%22%3Epolicosanol%3C/a%3E&SUName=heartdisease&TopNode=3370&type=1">About.com</a> and I recommend that everyone drop by and read that post...
  • Flor-Essence: Flor-Essence
    Today while I was going through my blogline list I found an intresting read on "". Having astute readers here at <a href="http://www.floressence-resource.com">Flor-Essence</a> I think you might enjoy reading that post and gleening some new knowledge, or at least be entertained by the posting...
 

Personal tools