For the past week and a half roughly we have been going over text mining, or for my understanding of it, it would be simpler to say we were learning how to use computer programs to analyze and draw some conclusion about a written work. The first thing that we were introduced to was really a text mining program so much as word history app called ngram while the program doesn’t say anything about a document it does tell us about how the use of a word or popularity of the physical manifestation of the word has changed throughout time.
The first actuall text mining program that we look at was Antconc, antconc is a program with a bunch of specialized version that people may find handy for different digital history projects. Impartially I can says that the program is a good program that is quite well built once you know how to use it, but if you asked if I would suggest it to any one I would say that it would probably not be my first suggestion. The program is not something that most people could open up and start using right after downloading it, it will require most people to find a tutorial to do anything with it, unlike the third program we looked at.
The third program that we looked at was voyant, the program itself is much more, new user friendly but probably has less overall functionality than antconc. I think the biggest draw for voyant for me personally is the word cloud option, while not truly telling you what the word text that they call a corpus is about, it does give you decent idea and let you know what part to focus on, an example is the word cloud below.
The word could is from a paper I did about the band Deep Purple’s induction into the music hall of fame, now if I didn’t tell you that it isn’t too hard to make a guess in that direction, by noticing that both hall and fame are frequently used words, and band, song, rock would tell you that it was about music.
The word cloud also has one other important feature that while not used for text mining can be very useful for an individual, that is that the fact that you can see if you are using a lack luster filler word to often. Using the word could above as an example again we can see like, didn’t, and having in there while in this case they partially are showing up due to the length of the paper it, in a longer paper it would be something I would want to go in and see if I could change them for a better word chose. Even in the paper itself if had used the program before I turned it in I could have seen those words and gone back and seen if I could change the wording in that area to avoid using them and get a higher word count. While the word could is by far the least useful for a researcher or historian the programs ability to find phrases and show correlation between words is probably quite helpful for finding if a text corpus has info that they are looking for specifically.
My overall experience with text mining programs is that I probably won’t find myself using antconc, or ngrams very often, if ever, I can see potentially using voyant and its word cloud to check if I’m using filler words to often and as a way to lengthen my papers which I generally find an issue for me.