Creating CLI tool in Python - part 3

Photo by: Frank V. / Unsplash

In the previous post, I created the first command for an assistant called blog and added an argument title to it. Title argument is validated and most of the business logic is covered with unit tests.
In this post, I'm going to create a new argument called image and validate it. The business logic will be covered with unit tests.
The second plan is to do a bit of refactoring and make a CLI tool more user-friendly by switching from arguments to options.

Overview

When I started to implement a image option logic I relized that I need to make changes in busines logic and add a project folder option also with a validation. So after this change the whole logic will look as following:

To start an application user will have to type on the command line following command:

Copy
1assistant blog --title "Title of the blog" --image "https://unsplash.com/photos/9FvZfRKKfH8"

As seen on the gif above, the project starts to look like a real tool. The main change is that now application uses click option's - in my opinion, this makes interaction with a tool more human-readable.

Project structure

In structure I have made some minor changes - for better readability, I moved blog command logic into a separate module. I introduced new modules called file_handler and web_scraper.

Copy
1assistant/
2|-- README.md
3|-- assistant.py
4|-- blog-command.py
5|-- file-handler.py
6|-- install-dev.sh
7|-- logger.py
8|-- setup.py
9|-- str_helper.py
10|-- test_str_helper.py
11|-- test_validator.py
12|-- test_web_scraper.py
13|-- validator.py
14|-- web_scraper.py

assistant.py

In a previous post, I implemented a blog command and title argument. At this point passing title using click argument seemed fine but when I added an image argument I realized the interaction with an application did not feel right. So I decided to change it and start using click options.
Here is an example of command before the change:

Copy
1assistant blog "Title of the blog" "https://unsplash.com/photos/9FvZfRKKfH8"

And this is after:

Copy
1assistant blog --title "Title of the blog" --image "https://unsplash.com/photos/9FvZfRKKfH8"

The same command can be written in a short version:

Copy
1assistant blog -t "Title of the blog" -i "https://unsplash.com/photos/9FvZfRKKfH8"

So let's check what has changed in an assistant.py file.
On project import lever there are some changes. I have moved a logger and validation module import under the blog_command module and now import this instead. Also, I need a str_helper, to check if a user has passed a project path or not.

Copy
1import click
2import os
3import blog_command
4from str_helper import is_null_or_whitespace

Program entrypoint is still cli() function, no changes made here.

Copy
1@click.group()
2@click.option(
3 '-v',
4 '--verbose',
5 is_flag=True,
6 help='Will print verbose messages about processes.'
7)
8@pass_config
9def cli(config, verbose):
10 config.verbose = verbose

Blog command business logic is now moved under a blog_command module. But another thing to notice is that I have created two additional parameters - image and project-path.
The project-path is optional, but if not provided a current CLI location is used. For that, I have imported an os module and use the getcwd() method.

Copy
1@cli.command()
2@click.option(
3 '-t',
4 '--title',
5 required=True,
6 type=str,
7 help='The title of blog post.'
8)
9@click.option(
10 '-i',
11 '--image',
12 required=True,
13 type=str,
14 help='The Unsplash image url.'
15)
16@click.option(
17 '-p',
18 '--project-path',
19 required=False,
20 type=click.Path(),
21 help='The full path to project folder. Default: current working directory.',
22 default=os.getcwd()
23)
24@pass_config
25def blog(config, title, image, project_path):
26 """Use this command to start a new blog post."""
27 blog_command.handle(config, title, image, project_path)

blog_command.py

As visible below, the blog command is getting quite heavy, mainly it's because I have defined some additional information logging. The main idea in this command is to make a validation first and then get image data and download an image. Right now what annoys me is that the user does not have feedback on how much of an image is downloaded. For now, it's fine, but I will come up with something later when the "happy path" is implemented.

Copy
1from slugify import slugify
2from os import path
3import logger
4import validator
5import web_scraper
6import file_handler
7
8def handle(config, title, img_url, project_path):
9 try:
10 logger.info(config.verbose, 'Starting project path validation.')
11 path_validation_result = validator.validate_project_path(project_path)
12 logger.success(path_validation_result)
13
14 logger.info(config.verbose, 'Starting title validation.')
15 title_validation_result = validator.validate_tile(title)
16 logger.success(title_validation_result)
17
18 logger.info(config.verbose, 'Starting image url validation.')
19 img_validation_result = validator.validate_img(img_url)
20 logger.success(img_validation_result)
21
22 logger.info(config.verbose, 'Requesting image data.')
23 file_name = '.'.join((slugify(title),'jpg'))
24 image = web_scraper.get_image_author(img_url, file_name)
25 logger.info(config.verbose, 'Image url: %s' % image.url)
26 logger.info(config.verbose, 'Image file name: %s' % image.file_name)
27 logger.info(config.verbose, 'Image author: %s' % image.author_name)
28 logger.info(config.verbose, 'Image author profile: %s' % image.author_profile)
29 logger.success('Successfully aquired image data.')
30
31 logger.info(config.verbose, 'Starting image download.')
32 full_file_path = path.join(file_handler.find_sub_folder(project_path, '/src/images'), file_name)
33 web_scraper.download_img(img_url, full_file_path)
34 logger.success('Image "%s" downloaded succesfully to "%s".' % (img_url, full_file_path))
35
36 except ValueError as er:
37 logger.error('Validation Error: {}'.format(er))
38 except Exception as ex:
39 logger.error(format(ex))

file_handler.py

The file_handler module is something I added to a project. This will contain all the files, directory related logic. For now, there will be only one search method, maybe later I will add something else.
All this method does, is receiving a subdirectories of parent directory and then find a specific one. This is needed because I have to make sure that the project folder contains an "images" folder. Since my blog project directory structure is nested, I decided that it's more comfortable to make it so that it's enough if the user is inside the project folder. Don't have to be navigated any other subdirectory.

Copy
1import os
2
3def find_sub_folder(parent_path, sub_path):
4 """Find a sub directory from parent folder.
5
6 Returns:
7 If the folder contains subdirectory then the full path to a subdirectory is returned.
8 Else None is returned.
9 """
10
11 folders = []
12
13 # r=root, d=directories, f = files
14 for r, d, f in os.walk(parent_path):
15 for folder in d:
16 folders.append(os.path.join(r, folder))
17
18 result = [x for x in folders if sub_path in x]
19
20 if not result:
21 return None
22
23 return result[0]

validator.py

This module now got the update, I added two additional validation methods, one for image and the other for project-path. The validate_img method is straight forward, basically, I had to make sure it's not empty if it is an exception is thrown. Then I had to make sure it's a valid URL - for this I used a regex. I did not want to add an Unsplash to regex, because maybe later there will be some other image provider or I will add images from some other place. So to support only Unsplash images, for now, I added if check.

Copy
1def validate_img(img):
2 """Validate blog image.
3 - required
4 - starts with https
5 - is an Unsplash link
6
7 returns:
8 Validation success message.
9 """
10 if is_null_or_whitespace(img):
11 raise ValueError('Blog image is required, currently supporting only Unsplash.')
12
13 regex = re.compile(
14 r'^(?:http|ftp)s?://' # http:// or https://
15 r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|' #domain...
16 r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # ...or ip
17 r'(?::\d+)?' # optional port
18 r'(?:/?|[/?]\S+)$', re.IGNORECASE)
19
20 result = re.match(regex, img)
21
22 if result is None :
23 raise ValueError('Invalid blog image url.')
24
25 if "unsplash.com/photos/" not in img:
26 raise ValueError('Invalid blog image url, currently supporting only Unsplash images.')
27
28 return 'Validation Success: Image "%s" is valid.' % img

The validate_project_path method just makes sure that project-path parameter is not empty and project folder contains an "images" folder.

Copy
1def validate_project_path(path):
2 """Validate project path.
3 -required
4 -should contain a 'images' folder
5
6 returns:
7 Validation success message.
8 """
9
10 if is_null_or_whitespace(path):
11 raise ValueError('Path to blog project is required.')
12
13 if not find_sub_folder(path, '/src/images'):
14 raise ValueError('Blog project does not contain folder "images".')
15
16 return 'Validation Success: Project path "%s" is valid.' % path

web_scraper.py

To download images and get image information for Unsplash I'm using a urllib and BeautifulSoup library.

Copy
1from urllib import request, parse
2from bs4 import BeautifulSoup

I have defined an Image class, to keep image data in one place after receiving it.

Copy
1class Image:
2
3 def __init__(self, file_name, url, author_name, author_profile):
4 self.file_name = file_name
5 self.url = url
6 self.author_name = author_name
7 self.author_profile = author_profile

In download_img method I just combine a URL and download image to a provided path.

Copy
1def download_img(imageUrl, filePath):
2 """Download image from Unsplash and save it in provided location"""
3
4 downloadEndPoint = imageUrl + '/download?force=true'
5 request.urlretrieve(downloadEndPoint, filePath)

In get_image_author I have defined a selector class, then making request and storing response in variable to decode it and parse it using BeatifulSoup. The rest is just to select a correct data from response.

Copy
1def get_image_author(imageUrl, file_name):
2 """Request image author data from page.
3
4 Returns Image object with filled data.
5 """
6
7 selector = '_3XzpS _1ByhS _4kjHg _1O9Y0 _3l__V _1CBrG xLon9'
8 response = request.urlopen(imageUrl)
9
10 if response.code != 200:
11 raise Exception('Failed to make request to "%s"' % imageUrl)
12
13 data = response.read()
14 html = data.decode('UTF-8')
15 soup = BeautifulSoup(html, "html.parser")
16 anchor = soup.find('a', class_=selector)
17 username = anchor['href'].lstrip('/')
18 author = anchor.contents[0]
19 parsed_uri = parse.urlparse(imageUrl)
20 author_profile = '{uri.scheme}://{uri.netloc}/'.format(uri=parsed_uri)
21 image = Image(file_name, imageUrl, author, (author_profile + username))
22
23 return image

Summary

In this post, I created an additional two options/parameters to blog command - image and project-path. Created an image download logic and added more unit tests.
In the next post, I'm going to create a blog post starter file and fill it with some initial data.
Like always, the source code of this post is available in Github.

Available resources