In this chapter, we will learn about the essential knowledge of Python programming that is considered minimally necessary for practicing data science, which we will learn about later. There is a lot of information about Python programming on the Internet, so if you want to learn more deeply or want information from a different perspective, please use the link collection at the end and the Internet search for reference, and try to find out.
First, for those who are learning programming for the first time, let me explain the overview of programming. Programming is the work of making methods or procedures to solve specific problems, called algorithms, executable on a computer. The text files described by programming are called source code.
Any algorithm consists of a combination of three structures: “Sequential,” “Conditional,” and “Repetitive” for the flow of processing. This flow of processing is called a control structure. The roles of each control structure are as follows:
To visualize the control structures, let’s take the example of the procedure of “opening the lid of a plastic bottle” from various algorithms lurking in everyday life. Let’s write down the flow of the process of “opening the lid of a plastic bottle” as follows. At the end of each process, the corresponding control structure is described.
It was a very simplified explanation, but did you get some image? Various algorithms exist in the actions we casually do in our daily lives, consisting of combinations of sequential, conditional, and repetitive processing. By making full use of programming, it is possible to automate anything using computers, which is not an exaggeration to say that it leads to our convenient lives today.
What basic knowledge is needed to implement the algorithm of “opening the lid of a plastic bottle”, which we took as an example, using the programming language Python, we would like to introduce in order.
There is a lot of information about Python programming that can be obtained by searching the Internet, but here we focus on the items that are thought to be minimally necessary for those who are programming for the first time and introduce them. It’s not a big deal, so let’s learn by practicing while moving your hands. The method of learning by actually typing the sample program is called “copying,” and it is considered one of the efficient ways to learn programming.
For the basic operation methods for programming in Python on Google Colab using a screen reader, please refer to the previous chapter’s Experience Programming with Colab.
By using a function called print, which Python has as standard (more on functions later), you can output intermediate results of the program and more. Let’s write a program that outputs the message “Hello”. Please note that the strings you want to handle as data need to be enclosed in double quotation marks “ “ to distinguish them from the strings of the program itself.
print("Hello")
You can write comments in the program using strings following a sharp (#). It is recommended to write comments as needed to remember the meaning of the program later. In the sample code on this page, we will describe what is likely to be output by the print statement in comments, etc.
print("Hello") # "Hello" will be displayed
When implementing an algorithm, you need a mechanism to read and write information (data) that changes as the process progresses. In Python programming, mechanisms such as variables, lists, and dictionaries are mainly used.
A variable is a container for the data you are dealing with (for example, the state of whether the lid is off or not) and is the subject of reading and writing. A list-type variable is a type of variable that can group multiple variables. A dictionary-type variable is a type of variable that can manage multiple variables with labels called keys. Let’s look at some concrete examples.
Various numbers and strings can be assigned to variables using =. Also, by calling the same variable name, you can read out the stored value.
x = 1
print(x) # 1
x = "hello"
print(x) # hello
Lists can store multiple values or variables together.
a = [1, 2, 3]
print(a) # [1, 2, 3]
Dictionaries can store multiple values or variables with named keys.
d = {"cat": "neko", "dog": "inu"}
print(d) # {'cat': 'neko', 'dog': 'inu'}
Values or variables can be arithmetically operated with each other.
x = 1
y = 2
print("x plus y is", x + y) # Addition
print("x minus y is", x - y) # Subtraction
print("x times y is", x * y) # Multiplication
print("x divided by y is", x / y) # Division
A function, like a mathematical function, is a process that performs some calculation or operation based on given values (arguments) and returns the result to the caller. If there are similar sets of operations, you can write a very concise program by using functions appropriately.
def sum(a, b): # function definition
return a + b # value the function returns
answer = sum(1, 2) # function call
print(answer) # 3
The if statement allows you to branch the process depending on the value of a variable, etc.
x = 1
if x == 1: # When the value of the variable x is 1, the next line is executed
print("x is 1")
else: # When the value of the variable x is not 1, the next line is executed
print("x is not 1")
The for statement allows you to repeat processing depending on the contents of lists and dictionaries.
for i in [1, 2, 3]: # Multiple values after 'in' are stored in variable i, and the next line is executed repeatedly
print(i) # Displays 1, 2, 3 in order
Let’s implement the program for opening the pet bottle cap, which was taken as an example of an algorithm, using what we have learned so far.
count = 0 # A variable to count the number of times the lid has been turned
for i in range(10): # Repetitive processing
count = count + 1 # Count up one turn of the lid
print(f"The lid has been turned {count} times") # Output of the progress. Writing a variable in a string using a mechanism called f-string.
if count == 4: # Check if the lid is open. This time, it will open in 4 turns.
break # Exit the loop
print("The lid has been removed") # End message
One of the major reasons why Python is popular is the abundance of external libraries that allow you to reuse parts of the vast amount of programs created by people around the world. For example, let’s try using the numpy external library, which provides various mathematical mechanisms.
import numpy
average = numpy.mean([1, 2, 3]) # average
print(average) # 2
External libraries commonly used in data science include the following:
Google Colab, which we use in this content, has several data science-oriented external libraries pre-installed, including the ones above. However, if you want to use them in an environment where they are not installed, you need to install them using a mechanism called pip. Please refer to the list of links provided at the end for more details and try researching them.
In this chapter, we learned the essential knowledge of Python programming necessary for practical data science. We gained an understanding of fundamental programming concepts such as algorithms, programming, and control structures (sequential, conditional, and iterative).
We also acquired basic knowledge of Python syntax, variables, lists, dictionaries, arithmetic operations, functions, if statements, for loops, and more. Additionally, we learned how to implement algorithms in Python and deepened our understanding through concrete examples of algorithms.
Furthermore, we explored the usage of external libraries, which is a powerful feature of Python. We introduced commonly used external libraries in data science. It is common to encounter challenges at the beginning, but it is helpful to search for error messages on the internet to find solutions or ask knowledgeable individuals for help. By gradually expanding our capabilities through these approaches, we can overcome obstacles and make progress in our programming journey.