Criminal Activity
Hello fellow miscreants! Er…I mean data analysts! Welcome back to my weekly newsletter detailing all things basketball and jump training! (All things in basketball shorter than about 6’ 5” tall that is.) This week’s special keyword is: DATA. That’s right, be the fifth person to send me an email with this keyword and you will win an incredible undisclosed prize! (Those of you who join my March Madness bracket groups every year know that all of my prizes are undisclosed.)
It turns out that without data, you can’t really be a data analyst. So in my attempt to not eschew the self-proclaimed title of data analyst I have been off to collect data! This past week’s reward was some great progress towards a functional web scraper that I can use to get some NCAA D1 Men’s basketball data.
Hitting the Wall
For those of you who have never tried automatically pulling data off of websites, the internet is not always kind to people who don’t hire scores of undergraduate researchers to copy and paste thousands of pages of data into databases. When you write a program to collect data for you, depending on what you are trying to retrieve, it can send a large number of requests to a server which hosts the website. This can impact the ability of us living, breathing humans to access a website.
Imagine a line to access your favorite website, but for every person in line there are hundreds of bots also in line. It’s going to take some time before you make the front of the queue.
As a result, many servers will have policies in place that block requests from addresses that are making a large number of requests in a short time. That is exactly the wall that I hit this week. You can see one such error message below:
For those who don’t speak HTML, you can just ignore the things that look like mathematical sandwiches i.e. <li></li> Basically, the message that mattered for me was as follows:
You have written a bot that accessed too many files too quickly. -That Blasted Cloud Service Provider
Jumping Over the Wall
After hitting the wall, I had to take a deep breath, look myself in the mirror, and ask myself if I was man enough to try and jump over the wall. I may only have a 30” vertical but I am betting this wall is 29” inches and I am going to jump over it!
In reality, it is going to be a simple fix this week- I just need to make some changes to my Python code to make sure that I don’t make too many requests too fast.
A Caveat
For those who may be thinking, “Cameron, clearly these people don’t want you scraping their data.” I would respond by saying, “I would normally agree with you, but I have read their terms of use, and they are very explicit about not allowing automated scraping IF that scraping adversely impacts the performance of their website.” Lucky for me, they are also explicit about what they consider to cause adverse performance for their site; more than 20 requests in less than a minute.
This is not the case with all websites, others will just straight up deny your program access to the site if they can see that it comes from an automated scraper. There are ways around that, but I don’t personally feel comfortable gathering data that I know other people are not okay with me obtaining via web scraping.
I am really hopeful that in the coming weeks I can have an awesome dataset to pore over and analyze!
Training Update
This week of vertical jump training was a downturn. My shoulders and shins were both feeling great to start the week! I translated that to mean that I had permission to try and increase the workload on my shoulders. For Monday, that meant practicing my power clean form.
The feedback I received on my power clean form was that it’s pretty good but that my back is too rounded. I also learned by the following day that it was too much for my shoulders.
Wednesday brought on Front Squats, which I don’t think were too hard on my AC joint but are just generally difficult. The final lift day, Friday, featured power snatches. I knew right away that it seemed to difficult for my shoulders, but decided I would give it a try with just the bar. I regretted it immediately with the shoulder ninja returning to give some stabs in my shoulder. So, instead I tried practicing the “first pull” portion of power snatches. After reviewing my video, I found a similar problem to power clean- a severely rounded back.
Jump day finally came on Saturday, I knew we would be in a rural town so I tried to arrange ahead of time to be able to access a church gym. Alas, a communication snafu left me waiting outside of the church with no keys on the way. I will just say, the outside conditions looked like this:
I decided to do the best I could and jumped in the covered pavilion area at the local park.
Happy jumping! It’s weeks like this that I remember the following:
“Indisputable truths:
50% of jump sessions will be below average.
50% of lifts will be below average.” - Isaiah Rivera, pro dunker, 50”+ vertical jump