© Lloyd
Your outline includes this:
Use topic modeling to find the concepts in documents. - Topic modeling software goes through a document, and does word counts. On the basis of that, it determines the relative importance of the various concepts in the document.
That's interesting. Can you demonstrate it?
I tried doing that manually before and found that it's a lot more work than you'd think. Much of it is pretty easy. Here's an easy way to do part of the job. Copy the text into a Word document. Remove punctuation using the Find and Replace command. Then replace all spaces with Paragraphs. Then copy all the text and past it into Excel. All of the text should then be all in the first column. Then with the text still highlighted, or after highlighting just the one column, click on Data, then on Sort, then Okay. All the words will then be in alphabetical order and grouped together.
After that things get difficult. You have to delete all of the redundant words and all of the trivial words. For me it was hard to decide which words were important, aside from prepositions, conjunctions, articles, pronouns, numbers etc.
So I'm very curious to see how well a Topic Modeling program can do.