2013-02-01-Lab
Table of Contents
1 Lab: Obtain and Explore Data
- Setup GitHub account
- Find a data set or external API
- Superficially examine it
- Summarize findings
- Submit assignment via GitHub
2 Why GitHub?
git
tool is standard in industry- GitHub provides best tools for sharing, commenting code
- This assignment will not have code, just practice submitting
3 Setup GitHub account
- Create a GitHub Account, making sure to use your .edu address
- Use GitHub/Edu to request a free micro plan: these let us use private accounts
- Setup a GitHub SSH Key
4 Setup git repository on ischool server
- On the server ischool.berkeley.edu
$ git clone git://github.com/jblomo/datamining290.git
- On the server, in the datamining290 directory run
$ git remote rename origin jblomo
5 Connect it to GitHub
- After you recieve your free micro account on GitHub, create a private repository called datamining290
- It will provide you with an SSH git path, let's call it PATH
- You must use the SSH PATH starting with
git://
- On the server, in the datamining290 directory, run
$ git remote add origin PATH $ git push origin master
6 Share with us
- Hopefully you now have a private copy of my repository
- Add Shreyas and me (users: seekshreyas, jblomo) as a contributor to your private repository
7 Obtain Data
- Look through the links in slides for interesting data sets, or find your own
- Or find a service API, like NYTimes
- Explore the data available to answer the following questions
8 Questions
- What are the types of data available to you?
- For data sets: how many records are in the data set?
- For API: what are the limits on fetching data?
- Provide an "interesting" record, explain its properties and why it is interesting
- What are 3 questions you could answer using your data?
9 Submit Homework
- On the ischool server, create a branch called
hw-obtain-data
- Create a text file to write the solution, a simple editor to use is
pico
git add
the filegit commit
the changegit push origin hw-obtain-data
to put it on GitHub- on github, submit a "pull request" from the
hw-obtain-data
branch to your master branch
9.1 Pull Requests notes
- Pull requests are a way of showing updates in a way that lets me provide comments, get notifications
- This is the first time I've tried it for class, so you're on the cutting edge. Hopefully it will work, give me feedback if it is not
10 Going Forward
- Other homework assignments will be completing code
- General work-flow:
- Start a new branch
- Add required files
- push to GitHub
- Submit Pull Request