Skip to content

Portal

You are here: Home » Code » ai » GoSOM2

GoSOM2

Document Actions
This is the lightning talk about GoSOM2, as seen live at the Plone Conference 2005 in Vienna.
You can download all files used for the talk as a zipped tar archive here.

'Using Self Organizing Maps to categorize arbitrary text' fig.01
1. Concept of Self Organizing Maps fig.02

1.1. What is a SOM?

Analysis of similarities, grouping similar things.

fig.03
Clustering
Classification
Topology
Dimension Reduction
fig.04
Artificial Intelligence

Not a neuronal network, no teaching input needed for training.
fig.05

1.2. Example use cases

Protein 3d structure

fig.06
HIV classification
fig.07

2. Document Vectors

2.1. What is similarity of texts

fig.08

Word Distribution

2.2. Code example

Count how often a word is used

>>> f = open('wizoz10.txt','r')
>>> s = f.read()
>>> import dvg
>>> d = dvg.dvg(128)
>>> t = d.simplifyASCII(s)
>>> s[:100]
'L. Frank Baum\r\n\r\nChicago, April, 1900.\r\n\r\n\r\n\r\n THE WONDERFUL WIZARD OF OZ\r\n\r\n\r\n '
>>> t[:100]
'L FRANK BAUM CHICAGO APRIL THE WONDERFUL WIZARD OF OZ THE CYCLONE DOROTHY LIVED IN THE MIDST OF TH'
>>> w = t.split()
>>> l = {}
>>> for i in w :
... try :
... l[i] = l[i] + 1
... except KeyError :
... l[i] = 1
...
>>> l['DOROTHY']
346
>>> l['TOTO']
90
>>> l['THE']
2953
>>>

fig.09

2.3. Dimension Reduction

2.3.1. High Dimension

Almost infinite number of words per language

Almost infinite number of languages

Sparse Vector

fig.10
2.3.2. Low Dimension

(Constant) Random Matrix Multiplication
fig.11
Document Vector fig.12
Teuvo Kohonen (born July 11, 1934)
Finnish academican
fig.13
cell
fig.14
neighbourhood of the cells fig.15
eigenvector fig.16
winner fig.17
Elastic grid fig.18

3.1. Examples

3.1.1. 2d Grid unfolding

3.1.1.1. Square unfolding

fig.19

3.1.1.2. Triangle unfolding

fig.20
3.1.1.3. Topological defect fig.21

3.2. Code example

map.Initialize()
__while True :
____sample = random.choice(samples)
____winner = map.getWinner(sample)
____map.scaleNeighbors(winner, sample)

fig.22

3. Live Demo

3.1. Gutenberg Texts

Texts swim on the map
Similar texts are grouped together
Different Language drift apart

4. Conclusion

4.1. Problems

4.1.1. Problems with large amounts of text

4.1.2. Problem with definition of similarity for text

Two level decument vector generation

First words per sentence, then sentences per text.

4.2. References

4.2.1. WEBSOM

5. Thanks

fig.23
Created by gogo
Last modified 2006-03-03 19:00
 

Personal tools