Perelman School of Medicine at the University of Pennsylvania

Introduction to Python for Medical Students

A Practical Primer

Michael Yao

10/30/23

Objectives

  • Download and install Python 3 on your computer

  • Be able to take input data and generate output results using Python

    • Be able to import modules libraries and load data

    • Know what an array and dataframe are

    • Understand what a function is and how common functions work

    • Learn data wrangling basics

Installing Python

  • Download Python at python.org/downloads

  • Download Visual Studio Code at code.visualstudio.com/download

    • Visual Studio Code (VS Code) is an Integrated Development Environment (IDE), which provides a visual and interactive interface to make coding in Python easier

    • Python is the programming language that we can use to explore datasets

Visual Studio Code (VS Code)

Packages

  • The power of Python is being open source

  • pip install package_name installs a package for the first time (only need to do once)

    • For this session, go to the terminal and run pip install numpy⏎ and pip install pandas⏎ in your console

  • import numpy as np and import pandas as pd loads these packages into your .py program

High Yield Basic Concepts

  • Assignment: assign a value to a variable name (e.g. x = 10 )

    • Note: your name shouldn’t have spaces. Instead use snake_case or camelCase
  • Functions: your “action verbs”, which take in input argument and return output (e.g. mean())

  • Comments: helpful statements to help you and others better understand your code, but are not executed. (e.g.  # This is a comment! )

A Little More About Functions

x = [2,4,6]
print(sum(x))
12

This is how to write your own function:

  • syntax: def function_name(arguments):

  • arguments: what you pass in as inputs

x = [2,4,6]
def mean(arr):
# arr is an argument
return sum(arr) / len(arr)
print(mean(x))
4

Arrays

  • Basic data structure in Python

  • Function np.array() converts a list into an array

  • x = [2,4,6]  # This is a list.
    x = np.array([2,4,6]) # This is an array.
  • Indexing [] retrieves elements of a vector by position (or by name for a named vector)

  • print(x[0])
    print(x[1])
    print(x[2])
    2
    4
    6

Useful Vector Functions

  • len(): returns the number of elements in the array

  • print(len(x))
    3
  • np.mean(): mean of elements in vector

  • print(np.mean(x))
    4
  • Be careful with NaN values

  • x2 = [2, 4, 6, np.NaN]
    print(np.mean(x2))
    nan

DataFrame's

  • Tidy data principles

    • Each row is an observation

    • Each column is a variable

    • Each cell contains one value

  • How do data frames relate to vectors?

    • Imagine a data frame as a bunch of vertical vectors next to each other
  • pd.DataFrame creates a data frame

Importing Data as DataFrame's

  • Common formats include .csv, .tsv, and .txt

    • Read with pandas function: pd.read_csv()

      More
    • Remember to import the pandas package!
      import pandas as pd

  • Software-specific formats include:

    • Excel (.xls, .xlsx)

      • Read with pandas function: pd.read_excel() More

Exploring Your Data

  • Use print(my_data.columns) to print out the features in your dataset.

  • Use print(my_data) to view the size of the dataset and some example rows.

  • Use the matplotlib library to plot data of interest. Tutorial

  • import matplotlib.pyplot as plt
    plt.figure()
    plt.plot(my_dataset)
    plt.show()

Accessing your Data

  • Use the [] operator to select specific columns

  • # Here's an example dataset previously imported
    iris
      Sepal.Length Sepal.Width Petal.Length Petal.Width Factor
            1          5.1         3.5          1.4      0.2
            2          4.9         3.0          1.4      0.7
            3          4.7         3.2          1.3      1.3
            4          4.6         3.1          1.5      1.0
            5          5.0         3.6          1.4      0.9
            6          5.4         3.9          1.7      0.4
                
    # Select the "Factor" attribute
    iris["Factor"]
    0.2 0.7 1.3 1.0 0.9 0.4
    # Select both the "Factor" and "Petal.Length" attributes
    iris[["Factor", "Petal.Length"]]

My code's not working...

  • Google and StackOverflow are your best friends

  • When asking a question online, make sure your code is as simple as possible. Only include the minimum required lines

  • ChatGPT may be helpful

  • Stop by Office Hours and ask for help!

Acknowledgements

Many thanks to Lathan Liou for providing the template for this presentation!

Useful Highest-Yield Resources