1. Introduction to Time Series in Python Time Series is a sequence of information which attaches a time period to each value
A common topic in Time Series Analysis is determining the stability of financial markets and the efficiency protfolios (效率投资组合) Properties All time-periods must be equal and clearly defined, which would result in a constant frequency
Frequency: how often values of the data set are recorded if the intervals are not identical => dealing with missing data Time-Dependency(时效性): the values for every period are affected by outside factors and by the values of past periods
SAS: Statistical Analysis System
Variables in SAS SAS has only two types of variables: character and numeric
Tips:
Use $ after the character variable Use . to replace the missing values Case INsensitive Invalid Names:
1_begins_with_a_number contains blanks contains-invalid-characters% Example 1: Basic Steps in SAS programs DATA: read/write/manipuplate the data and perform calculations after data step, SAS stores the data in its own special form called a SAS data set PROC: process SAS datasets in analyzing proc contents shows the descriptor portion of SAS data set proc print shows the data portion in a table RUN 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 /* Example 1: Read the dataset and print */ data Studgrade; /* data step: assign a name for the dataset and enter the data */ input StudID Midterm Final Grade $; /*Variables in the dataset and enter the data*/ datalines;101 98 86 A 102 49 60 C 103 98 80 A 104 90 98 A+ 105 60 80 B+ 106 .
Lec01: Sep 9 - Thursday Lec02: Sep 13 - Monday Lec03: Sep 16 - Thursday Lec04: Sep 20 - Monday Lec05: Sep 23 - Thursday Lec06: Sep 27 - Monday Lec07: Oct 4 - Monday Lec08: Oct 7 - Thursday Likelihood function, Minimal sufficient, Completeness Lec09: Oct 14 - Thursday Complete Sufficient Examples Ancillary and Basu’s Theorem Lec10: Oct 18 - Monday Chapter 7 - Point Estimation 点估计:利用样本函数的一个具体数值去估计一个未知参数
0. Introduction on Spark and Scala Spark Spark: a unified analytics engine for large-scale data processing.
fast: Run programs up to 100x faster than Hadoop Mapreduce in memory, or 10x faster on disk DAG(direct acyclic graph) Engine optimizes workflows Apache Spark consists of Spark Core Engine, Spark SQL, Spark Streaming, MLlib, GraphX and Spark R.
Spark Core: provides the in-built memory computing and referencing datasets stored in external storage systems and uses RDD data structure to speed up the data sharing in distributed processing systems like MapReduce from permanent storage like HDFS or S3 which may be slow due to the serialization and deserialization of I/O steps.
Day 1 - Sep 2 10:00 - 12:00 上船喽 12:00 - 14:00 去吃 Brunch 14:00 - 17:30 SkyWalk 我们去了今年新开的 skywalk 和滑滑梯!
Day 2 - Sep 3
ISLR4.4 - LDA on Credit, ROC, AUC Types of Errors