Welcome folks today in this post we will be doing text analysis
about as text file such as counting no of words,lines and special characters in python. All the full source code of the application is shown below.
Get Started
In order to get started you need to make an app.py
file and copy paste the following code
app.py
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
# -*- cofing: utf-8 -*- import os import sys import collections import string script_name = sys.argv[0] res = { "total_lines":"", "total_characters":"", "total_words":"", "unique_words":"", "special_characters":"" } try: textfile = sys.argv[1] with open(textfile, "r", encoding = "utf_8") as f: data = f.read() res["total_lines"] = data.count(os.linesep) res["total_characters"] = len(data.replace(" ","")) - res["total_lines"] counter = collections.Counter(data.split()) d = counter.most_common() res["total_words"] = sum([i[1] for i in d]) res["unique_words"] = len([i[0] for i in d]) special_chars = string.punctuation res["special_characters"] = sum(v for k, v in collections.Counter(data).items() if k in special_chars) except IndexError: print('Usage: %s TEXTFILE' % script_name) except IOError: print('"%s" cannot be opened.' % textfile) print(res) |
And now if you execute the python
script you need to provide also an additional command line
argument which will be the path of the text
file to analyze in this script.
python app.py input.txt
This is the input.txt
file that i am analyzing is shown below
input.txt
1 |
this is a text file |
So now as you can see it has returned all the information about the text file
i.e. how many characters and lines and also unique words and also special_characters. All this information is contained inside a json
object