Introduction to Data Science for Screen Reader Users

Introduction to Python

In this chapter, we will learn about the essential knowledge of Python programming that is considered minimally necessary for practicing data science, which we will learn about later. There is a lot of information about Python programming on the Internet, so if you want to learn more deeply or want information from a different perspective, please use the link collection at the end and the Internet search for reference, and try to find out.

Algorithms and Programming

First, for those who are learning programming for the first time, let me explain the overview of programming. Programming is the work of making methods or procedures to solve specific problems, called algorithms, executable on a computer. The text files described by programming are called source code.

Any algorithm consists of a combination of three structures: “Sequential,” “Conditional,” and “Repetitive” for the flow of processing. This flow of processing is called a control structure. The roles of each control structure are as follows:

Sequential:
- Perform each process in order
Conditional:
- Execute different processing depending on certain conditions
Repetitive:
- Repeat the processing while certain conditions are met

To visualize the control structures, let’s take the example of the procedure of “opening the lid of a plastic bottle” from various algorithms lurking in everyday life. Let’s write down the flow of the process of “opening the lid of a plastic bottle” as follows. At the end of each process, the corresponding control structure is described.

Turn the lid counterclockwise【Sequential】
Check if the lid comes off【Conditional】
If the lid comes off, end the process【Sequential】
If the lid does not come off, go back to the second process【Repetitive】

It was a very simplified explanation, but did you get some image? Various algorithms exist in the actions we casually do in our daily lives, consisting of combinations of sequential, conditional, and repetitive processing. By making full use of programming, it is possible to automate anything using computers, which is not an exaggeration to say that it leads to our convenient lives today.

What basic knowledge is needed to implement the algorithm of “opening the lid of a plastic bottle”, which we took as an example, using the programming language Python, we would like to introduce in order.

Basic Knowledge and Practice of Python Programming

There is a lot of information about Python programming that can be obtained by searching the Internet, but here we focus on the items that are thought to be minimally necessary for those who are programming for the first time and introduce them. It’s not a big deal, so let’s learn by practicing while moving your hands. The method of learning by actually typing the sample program is called “copying,” and it is considered one of the efficient ways to learn programming.

For the basic operation methods for programming in Python on Google Colab using a screen reader, please refer to the previous chapter’s Experience Programming with Colab.

The print statement

By using a function called print, which Python has as standard (more on functions later), you can output intermediate results of the program and more. Let’s write a program that outputs the message “Hello”. Please note that the strings you want to handle as data need to be enclosed in double quotation marks “ “ to distinguish them from the strings of the program itself.

print("Hello")

Comments

You can write comments in the program using strings following a sharp (#). It is recommended to write comments as needed to remember the meaning of the program later. In the sample code on this page, we will describe what is likely to be output by the print statement in comments, etc.

print("Hello") # "Hello" will be displayed

Reading and Writing Data

When implementing an algorithm, you need a mechanism to read and write information (data) that changes as the process progresses. In Python programming, mechanisms such as variables, lists, and dictionaries are mainly used.

A variable is a container for the data you are dealing with (for example, the state of whether the lid is off or not) and is the subject of reading and writing. A list-type variable is a type of variable that can group multiple variables. A dictionary-type variable is a type of variable that can manage multiple variables with labels called keys. Let’s look at some concrete examples.

Variables

Various numbers and strings can be assigned to variables using =. Also, by calling the same variable name, you can read out the stored value.

x = 1
print(x) # 1

x = "hello"
print(x) # hello

Lists

Lists can store multiple values or variables together.

a = [1, 2, 3]
print(a) # [1, 2, 3]

Dictionaries

Dictionaries can store multiple values or variables with named keys.

d = {"cat": "neko", "dog": "inu"}
print(d) # {'cat': 'neko', 'dog': 'inu'}

Arithmetic Operations

Values or variables can be arithmetically operated with each other.

x = 1
y = 2

print("x plus y is", x + y)   # Addition
print("x minus y is", x - y)   # Subtraction
print("x times y is", x * y) # Multiplication
print("x divided by y is", x / y)   # Division

Functions

A function, like a mathematical function, is a process that performs some calculation or operation based on given values (arguments) and returns the result to the caller. If there are similar sets of operations, you can write a very concise program by using functions appropriately.

def sum(a, b):  # function definition
  return a + b  # value the function returns

answer = sum(1, 2)  # function call
print(answer) # 3

Conditional branching with if statements

The if statement allows you to branch the process depending on the value of a variable, etc.

x = 1

if x == 1:   # When the value of the variable x is 1, the next line is executed
  print("x is 1")
else:        # When the value of the variable x is not 1, the next line is executed
  print("x is not 1")

Repetitive processing with for statements

The for statement allows you to repeat processing depending on the contents of lists and dictionaries.

for i in [1, 2, 3]:  # Multiple values after 'in' are stored in variable i, and the next line is executed repeatedly
  print(i) # Displays 1, 2, 3 in order

Implementing the algorithm

Let’s implement the program for opening the pet bottle cap, which was taken as an example of an algorithm, using what we have learned so far.

count = 0  # A variable to count the number of times the lid has been turned

for i in range(10):  # Repetitive processing
  count = count + 1    # Count up one turn of the lid
  print(f"The lid has been turned {count} times")  # Output of the progress. Writing a variable in a string using a mechanism called f-string.

  if count == 4:      # Check if the lid is open. This time, it will open in 4 turns.
    break  # Exit the loop

print("The lid has been removed")  # End message

Using external libraries

One of the major reasons why Python is popular is the abundance of external libraries that allow you to reuse parts of the vast amount of programs created by people around the world. For example, let’s try using the numpy external library, which provides various mathematical mechanisms.

import numpy

average = numpy.mean([1, 2, 3])  # average
print(average) # 2

External libraries commonly used in data science

External libraries commonly used in data science include the following:

numpy (an external library for numerical operations)
scipy (an external library that further extends numpy)
pandas (an external library for data processing)
scikit-learn (an external library for machine learning)

Google Colab, which we use in this content, has several data science-oriented external libraries pre-installed, including the ones above. However, if you want to use them in an environment where they are not installed, you need to install them using a mechanism called pip. Please refer to the list of links provided at the end for more details and try researching them.