19 – 21 Days of Data / Round 2

Friday – Sunday July 16 – 18, 2021


The Game Plan I Followed

After gaining a deeper understanding of the filesystem, I had some question to clarify with my good friend Carlos. He gave me a good tour of the things I had learned and now everything makes a lot more sense. I decided that I would:

  1. install pyenv => I am following 3 resources (Real Python, Shinichi Okada, and the github repo of the pyenv-installer).
  2. install pyenv-virtualenv
  3. install pipenv
  4. install jupyter, and get started

Dependencies for pyenv
First, my system is macOS BigSur and oh-my-zsh shell. For pyenv, I have some dependencies to install.

  1. Xcode Command Line Tools: I had to go searching for how to check this. The best I found was Daniel Kehoe’s article in mac.install.guide:
  2. Homebrew: I already have this installed. Once it is installed, you must enter this into your shell:
    ~ brew install openssl readline sqlite3 xz zlib

Installation for pyenv
Now, I can go into downloading pyenv (this is what I did):

~ curl https://pyenv.run | bash

But (alternatively) you can also do this through Homebrew:

~ brew install pyenv

Setup for pyenv
Once you have the pyenv installed you need to determine what your shell is, if you don’t know. You can enter this into your shell:

~ ps -p$$ – there are variations of this such as ps -p$$ -o comm="" and ps -p$$ -ocommand=

I have bin/zsh, so I need to command next lines in in the shell to write to ~/.zshrc:

~ echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
~ echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
~ echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n eval "$(pyenv init -)"\nfi' >> ~/.zshrc

The last line enables shims and autocompletion.

Finally, restart your shell with exec $SHELL and then check that everything is working.

Check for pyenv
You can take a look around at the versions of python available to install with pyenv install --list. It’s long and includes many “flavors” of python but at the top are the versions of regular vanilla. You can pyenve install <version> for any versions you like. Then you will be able to see what versions you have in pyenv with pyenv versions(watch that plural). My out put is:

system
* 3.7.11 (set by /Users/claytonlouden/.pyenv/version)
3.8.11
3.9.6

The “*” shows which is the active global version. You can set that with pyenv global <version>.  Make certain that pyenv is enabling the shell to execute the pyenv global version. I had to do some debugging because my pyenv global version was not my python -V. My ~/.zshrc did not contain all three of the previous commands outline above. The last command was not correct and was not executing the global pyenv version.

To be clear, the python -V should be the same starred version in the list. By consequence, you do not need to indicate python3 anymore. You can see the python3 -V is the same as the python -V which is the same as the starred python in the versions list.

At the moment I have a small kink in the fabric with my PATH directories. It seems that pyenv has duplicated the shims directory in my PATH 10+ times….I don’t think this will cause an immediate problem because it means that those directories will definitely be searched but I think that for optimizing time spent on searching directories will be affected though not noticeable.


Reading Corner

I have reviewed the 2nd Chapter of Feature Engineering by Alice Zheng. Point taken from Log Transform as away to reduce the effect of long-tailed distributions at the higher value level. Log transform will in turn expand the low end into a longer head. It will help transform the plot into a more Gaussian distribution as demonstrated in the following figure:

A good example of where we might want to use this is play count for a song in Spotify. The book suggests that to simplify whether a user likes a song the play count should be binarized. Meaning that if a user listened to a song it doesn’t matter that they listened to it 10 or 20 times. Any variety of play counts would indicate that the person liked the song.  So, playing a song 10 or 5 or 25 times should register a binary value 1 for a positive like.

In my opinion, it might be better to apply the log transform to show a more nuanced metric or place an arbitrary limit to a number of plays to give the user time to decide if they like the song. I know that the threshold for myself is definitely higher than 1 time. I might want to hear it twice to know if I like it or not. I know sometimes it has taken me up to 5 listens to a song before I realize I can’t stand it. So, I would not want an overly simplistic metric to be applied.

I would say that I rarely take 5 listens to a song to decide if I like or dislike it. So, any number between 3-5 might be a safe arbitrary limit to binarize at. But another way is to log transform which reduces the excessive plays of a song. We don’t want these because they don’t give us any valuable information as someone’s obsession or a mechanical repeat of the song don’t give us ay valuable information. But the fact that someone played it 10 times and not 3 might demonstrate a degree of positivity that is significant.

 

Leave a Reply

Your email address will not be published. Required fields are marked *