Pandas Cheat Sheet Github



This is a guide to many pandas tutorials, geared mainly for new users. # Internal Guides. Pandas’ own 10 Minutes to pandas (opens new window). More complex recipes are in the Cookbook (opens new window). A handy pandas cheat sheet (opens new window). # Community Guides # pandas Cookbook by Julia Evans. Quick and Dirty Pandas Cheat Sheet. I don't regularly write new scripts in pandas. This repo is to help me when I need to tweak a script I haven't changed in a while.

Table of Contents¶

  • unique - find unique rows
  • Working with time series data

unique - find unique rows¶¶

Find unique rows in dataset

CompanyPersonSales
0GOOGSam200
1GOOGCharlie120
2MSFTAmy340
3MSFTVanessa124
4FBCarl243
5FBSarah350

nunique - find number of unique rows¶¶

More efficient than finding the unique array and finding the length of it.

value_counts - find unique values and number of occurrences¶¶

apply - batch process column values¶¶

Calling the apply() is similar to calling the map() in Python. It can apply an operation on all records of a selected column. For instance, to find the squared sales, do the following

CompanyPersonSalessq_sales
0GOOGSam20040000
1GOOGCharlie12014400
2MSFTAmy340115600
3MSFTVanessa12415376
4FBCarl24359049
5FBSarah350122500

Pandas Functions Cheat Sheet

We can also define a function and call that within the apply() method. This can accept values of one or more columns to calculate a new column.

CompanyPersonSalessq_salescu_sales
0GOOGSam200400008000000
1GOOGCharlie120144001728000
2MSFTAmy34011560039304000
3MSFTVanessa124153761906624
4FBCarl2435904914348907
5FBSarah35012250042875000

Github Search Cheat Sheet

CompanyPersonSalessq_salescu_sales
1GOOGCharlie120144001728000
3MSFTVanessa124153761906624
0GOOGSam200400008000000
4FBCarl2435904914348907
2MSFTAmy34011560039304000
5FBSarah35012250042875000

Note how the index remains attached to the original rows.

CompanyPersonSalessq_salescu_sales
4FBCarl2435904914348907
5FBSarah35012250042875000
1GOOGCharlie120144001728000
0GOOGSam200400008000000
3MSFTVanessa124153761906624
2MSFTAmy34011560039304000
Pandas Cheat Sheet Github

isnull - finding null values throughout the DataFrame¶¶

CompanyPersonSalessq_salescu_sales
0FalseFalseFalseFalseFalse
1FalseFalseFalseFalseFalse
2FalseFalseFalseFalseFalse
3FalseFalseFalseFalseFalse
4FalseFalseFalseFalseFalse
5FalseFalseFalseFalseFalse

Working with time series data¶¶

This section explains how to specify datatypes of columns while reading data and how to define column converters to ease certain data types.

Unnamed: 0Registration DateCountryOrganizationCurrent customer?What would you like to learn?
0011/08/2019 06:09 PM ESTJamaicaThe University of the West IndiesNaNNaN
1111/08/2019 06:09 PM ESTJapaniLand6 Co.,Ltd.noI am interested ArcGIS.
2211/08/2019 05:56 PM ESTCanadaSafe Software Incyesdata science workflos
3311/08/2019 05:51 PM ESTCanadaLe Groupe GeoInfo Incyesgeneral information
4411/08/2019 05:26 PM ESTCanadaSafe Software Inc.NaNNaN

The Registration Date should be of type datetime and the Current customer? should be of bool. However, are they?

Everything is a generic object. Let us re-read, this time knowing what their data types should be.

Plotting time series¶¶

Now that the Registration date is datetime, we can plot the number of registrants by time. But before that, we need to set it as the index.

CountryOrganizationCurrent customer?What would you like to learn?
Registration Date
2019-11-08 18:09:00JamaicaThe University of the West IndiesFalseNaN
2019-11-08 18:09:00JapaniLand6 Co.,Ltd.FalseI am interested ArcGIS.
2019-11-08 17:56:00CanadaSafe Software IncTruedata science workflos
2019-11-08 17:51:00CanadaLe Groupe GeoInfo IncTruegeneral information
2019-11-08 17:26:00CanadaSafe Software Inc.FalseNaN

Add a counter column to the dataframe¶¶

Pandas Cheat Sheet Github Free

Note. It is important to count up only after sorting. Else the numbers are going to be all over the place.