gogo_som
'Using Self Organizing Maps to categorize arbitrary text'
1. Concept of Self Organizing Maps
1.1. What is a SOM?
Analysis of similarities, grouping similar things.
Clustering
Classification
Topology
Dimension Reduction
Artificial Intelligence
Not a neuronal network, no teaching input needed for training.
1.2. Example use cases
Protein 3d structure
HIV classification
2. Document Vectors
2.1. What is similarity of texts
Word Distribution
2.2. Code example
Count how often a word is used
>>> f = open('wizoz10.txt','r')
>>> s = f.read()
>>> import dvg
>>> d = dvg.dvg(128)
>>> t = d.simplifyASCII(s)
>>> s[:100]
'L. Frank Baum\r\n\r\nChicago, April, 1900.\r\n\r\n\r\n\r\n THE WONDERFUL WIZARD OF OZ\r\n\r\n\r\n '
>>> t[:100]
'L FRANK BAUM CHICAGO APRIL THE WONDERFUL WIZARD OF OZ THE CYCLONE DOROTHY LIVED IN THE MIDST OF TH'
>>> w = t.split()
>>> l = {}
>>> for i in w :
... try :
... l[i] = l[i] + 1
... except KeyError :
... l[i] = 1
...
>>> l['DOROTHY']
346
>>> l['TOTO']
90
>>> l['THE']
2953
>>>
2.3. Dimension Reduction
2.3.1. High Dimension
Almost infinite number of words per language
Almost infinite number of languages
Sparse Vector
2.3.2. Low Dimension
(Constant) Random Matrix Multiplication
3. Kohonen Maps
Teuvo Kohonen (born July 11, 1934)
Finnish academican
neighbourhood of the cells
eigenvector
winner
Elastic grid
3.1. Examples
3.1.1. 2d Grid unfolding
3.1.1.1. Square unfolding
3.1.1.2. Triangle unfolding
3.1.1.3. Topological defect
3.2. Code example
map.Initialize()
while True :
sample = random.choice(samples)
winner = map.getWinner(sample)
map.scaleNeighbors()
3. Live Demo
3.1. Gutenberg Texts
Texts swim on the map
Similar texts are grouped together
Different Language drift apart
4. Conclusion
4.1. Problems
4.1.1. Problems with large amounts of text
4.1.2. Problem with definition of similarity for text
Two level decument vector generation
First words per sentence, then sentences per text.
4.2. References
4.2.1. WEBSOM
4.2.2. Download link for GoSOM2 Plone Conference 2005 edition
http://gogo.bluedynamics.net/files/GoSOM2-ploneconf2005.tgz/download
5. Thanks
Trackback
The URI to TrackBack this entry is: http://gogo.bluedynamics.net/plone/blogs/pc2005/blog/archive/2005/09/19/gogo_som/trackback-
penis pills
health pills
-
57287d9368bef01762c2d797150e7be0:
57287d9368bef01762c2
57287d9368bef01762c2d797150e7be0<a href=\"http://57287d9368be.info\">57287d9368be</a>
-
JPET:
JPET
While browsing for content to write about, I discovered an intresting post on "" which I found relevent to blog readers here at <a href="http://jpet.aspetjournals.org/cgi/search?andorexactfulltext=and&resourcetype=1&disp_type=&sortspec=relevance&author1=&fulltext=%3Ca+href%3D%22http%3A%2F%2Fwww.doctorstrust.com%2Ffind%2Fpolicosanol.html%22%3Epolicosanol%3C%2Fa%3E&amp;amp;pubdate_year=&volume=&firstpage=">JPET</a> and I recommend that everyone drop by and read that post...
-
About.com:
About.com
While browsing for content to write about, I discovered an intresting post on "" which I found relevent to blog readers here at <a href="http://heartdisease.about.com/sitesearch.htm?terms=%3Ca%20href=%22http://www.doctorstrust.com/find/policosanol.html%22%3Epolicosanol%3C/a%3E&SUName=heartdisease&TopNode=3370&type=1">About.com</a> and I recommend that everyone drop by and read that post...
-
Flor-Essence:
Flor-Essence
Today while I was going through my blogline list I found an intresting read on "". Having astute readers here at <a href="http://www.floressence-resource.com">Flor-Essence</a> I think you might enjoy reading that post and gleening some new knowledge, or at least be entertained by the posting...