Spring 2019 Data 100/200A Midterm 1 Reference Sheet Pandas and Matplotlib
df is a DataFrame; s is a Series. Function
df[col]
df[[col1, col2]]
s.loc[rows] / df.loc[rows,
cols]
s.iloc[rows] / df.iloc[rows,
cols]
s.isnull() / df.isnull()
s.fillna(value) /
df.fillna(value)
df.drop(labels, axis)
df.rename(index=None,
columns=None)
df.sort_values(by,
ascending=True)
s.sort_values(ascending=True)
s.unique()
s.value_counts()
pd.merge(left, right,
how=inner, on=a)
left.merge(right,
left_on=col1, right_on=col2)
df.set_index(col)
df.reset_index(col)
Description
Returns the column labeled col from df as a Series.
Returns a DataFrame containing the columns labeled col1 and
col2.
Returns a Series/DataFrame with rows (and columns) selected by
their index values.
Returns a Series/DataFrame with rows (and columns) selected by their positions.
Returns boolean Series/DataFrame identifying missing values Returns a Series/DataFrame where missing values are replaced by
value
Returns a DataFrame without the rows or columns named labels
along axis (either 0 or 1)
Returns a DataFrame with renamed columns from a dictionary
index and/or columns
Returns a DataFrame where rows are sorted by the values in
columns by
Returns a sorted Series.
Returns a NumPy array of the unique values
Returns the number of times each unique value appears in a Series
Returns a DataFrame joining DataFrames left and right on the column labeled a; the join is of type inner
Returns a DataFrame joining DataFrames left and right on columns labeled col1 and col2.
Returns a DataFrame that uses the values in the column labeled col as the row index.
Returns a DataFrame that has row index 0, 1, etc., and adds the current index as a column.
Groups, Strings, & Plots
grouped = df.groupby(by) where by can be a column label or a list of labels.
Function
grouped.count()
grouped.size()
grouped.mean()/grouped.min()/grouped.max() grouped.first()/grouped.last()
Description
Return a Series containing the size of each group, excluding missing values
Return a Series containing size of each group, including missing values
Return a Series/DataFrame containing mean/min/max of each group for each column, excluding missing values
Return a Series/DataFrame containing first/last element of each group for each column
s is a series of strings.
Function
s.str.len() s.str.lower()/s.str.upper()
s.str.replace(pat, repl)
s.str.contains(pat)
s.str.extract(pat)
x and y are sequences of values.
Function
plt.plot(x, y)
plt.scatter(x, y)
plt.hist(x, bins=None)
plt.bar(x, height)
.
Regular Expressions
Description
Returns a Series containing length of each string
Returns a Series containing lowercase/uppercase version of each string
Returns a Series after replacing occurences of substrings matching regular expression pat with string repl
Returns a boolean Series indicating whether a substring matching the regular expression pat is contained in each string
Returns a Series of the first subsequence of each string that matches the regular expression pat. If pat contains one group, then only the substring matching the group is extracted
Description
Creates a line plot of x against y
Creates a scatter plot of x against y
Creates a histogram of x; bins can be an integer or a sequence Creates a bar plot of categories x and corresponding heights height
List of all metacharacters: . ^ $ * + ? ] [ | ( ) { }
Operator .
|
d, w, s D, W,
S
*
?
+ *?, +? {m}
{m, n}
^, $ []
()
[^ ] Function
Description
Matches any character except n
Escapes metacharacters
Matches expression on either side of expression; has lowest priority of any operator
Predefined character group of digits (0-9), alphanumerics (a-z, A-Z, 0-9, and underscore), or whitespace, respectively
Inverse sets of d, w, s, respectively
Matches preceding character/group zero or more times
Matches preceding character/group zero or one times
Matches preceding character/group one or more times
Applies non-greedy matching to * and +, respectively
Matches preceding character/group exactly m times
Matches preceding character/group at least m times and at most n times; if either m or n are omitted, set lower/upper bounds to 0 and , respectively
Matches the beginning and end of the line, respectively
Matching group used to match any of the specified characters or range (e.g. [abcde]) [a-e]) Capturing group used to create a sub-expression
Invert matching group; e.g. [^a-c] matches all characters except a, b, c
Description
Returns a match if zero or more characters at beginning of string matches pattern, else None
Returns a match if zero or more characters anywhere in string matches pattern, else None
Returns a list of all non-overlapping matches of pattern in string (if none, returns empty list)
Returns string after replacing all occurrences of pattern with repl
re.match(pattern,
string)
re.search(pattern,
string)
re.findall(pattern,
string)
re.sub(pattern, repl,
string)
. SQL
For a table x with columns labeled a and g, here are two example SELECT statements: SELECT a, a+1 AS b FROM x WHERE b > 2 ORDER BY -a;
SELECT g, max(a) FROM x GROUP BY g HAVING min(a) > 1;
Syntax
FROM s INNER JOIN t ON
cond
FROM sJOINtONcond
FROM s LEFT JOIN t ON
cond
FROM s,t
FROM (SELECT )
WHERE a IN (SELECT
)
ORDER BY RANDOM LIMIT
n
CASE WHEN pred THEN
cons ELSE alt END
Description
Inner join of tables s and t using cond to filter rows Same as above.
Left outer join of tables s and t using cond to filter rows
Cross join of tables s and t: all pairs of a row from s and one from t Select rows from a temporary table defined by a select statement
Select rows for which the value in column a is among the values in a one- column temporary table defined by a select statement
Draw a simple random sample of n rows
Evaluates to cons if pred is true and alt otherwise; Multiple WHEN/THEN pairs can be included, and ELSE is optional
Reviews
There are no reviews yet.