Pandas and DataFrames

In this lesson we will be exploring data analysis using Pandas.

  • College Board talks about ideas like
    • Tools. "the ability to process data depends on users capabilities and their tools"
    • Combining Data. "combine county data sets"
    • Status on Data"determining the artist with the greatest attendance during a particular month"
    • Data poses challenge. "the need to clean data", "incomplete data"
  • From Pandas Overview -- When working with tabular data, such as data stored in spreadsheets or databases, pandas is the right tool for you. pandas will help you to explore, clean, and process your data. In pandas, a data table is called a DataFrame.

DataFrame

'''Pandas is used to gather data sets through its DataFrames implementation'''
import pandas as pd

Cleaning Data

When looking at a data set, check to see what data needs to be cleaned. Examples include:

  • Missing Data Points
  • Invalid Data
  • Inaccurate Data

Run the following code to see what needs to be cleaned

df = pd.read_json('grade.json')

print(df)
# What part of the data set needs to be cleaned?
#Everything that is not the grade
# From PBL learning, what is a good time to clean data?  Hint, remember Garbage in, Garbage out?
# Clean data whenever you are working so that it does not stack up.
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 2
      1 # reads the JSON file and converts it to a Pandas DataFrame
----> 2 df = pd.read_json('grade.json')
      4 print(df)
      5 # What part of the data set needs to be cleaned?
      6 #Everything that is not the grade
      7 # From PBL learning, what is a good time to clean data?  Hint, remember Garbage in, Garbage out?
      8 # Clean data whenever you are working so that it does not stack up.

NameError: name 'pd' is not defined