jump to navigation

Data Science in Your Business (Notes from Week of April 15 – 21, 2019) May 8, 2019

Posted by Anthony in Automation, experience, finance, Founders, Hiring, questions, Stacks, Strategy, training, Uncategorized.
Tags: , , , , , , , , , , , , , , ,
1 comment so far

It feels appropriate to have the week of Google I/O’s conference to be the one that aligned with my notes where data was the primary focus, especially when Google was pushing ease of technology (centered around giving them access to more data). There were some excellent memes/pictures around for the differences of Facebook asking (hah) for data compared to Google (where they have a ton of it already but they mention the stuff coming).

Kaggle, a research data competition site, held a conference centered around hiring and careers in data with guest speakers from some of the most interesting companies working with data, including Google. Listening to career-based or hiring podcasts related to the field gives insights to how corporations or orgs focus on the spectrum of people vs skills. The other side of this would be a discussion on how data science teams can impact the business at value. What can be done with the data? Is it helpful? Which metrics are measurable and important?

A few episodes went into the business and general application of research in the data. Research on how personality and music are interrelated or satellite imagery from NASA to provide various live solutions – and to what extent they can be designed to be used. A few non-data science-specific podcasts dealt with FinTech, HealthTech and marketplace tuning. How do startups fight against incumbents in various marketplaces? Are their offerings sustainable or do they break the model of what we have seen?

Hopefully my notes provide some incentive to go back and listen to one or each of the podcasts. Or connect via Twitter to talk more!

  • Validating D/S with QuantHub, Matt Cowell CEO (BDB 4/2/19)
    e9qpszxn_400x400

    • Also with Nathan Black, Chief Data Scientist at QuantHub
    • Talking data science – math with business and IT skillsets
    • Companies are manually doing tech assessments with candidates / roles – programmer-based primarily
      • QuantHub looks at a comprehensive, scientific approach to assessment of the stack of what may be necessary
    • NLP of resume, and then Bayes’ updating for input and results from that
      • Assessment platform at the core and using it for hiring (natural use-case) but then benchmarking organizational skills
      • Aggregators of content and matchmaking (say, data analyst up to data engineer – wrangling, SQL improvement)
    • Assessments done and individuals won’t be charged – overall value in helping talent
      • Building the training side in the next quarter
    • How do companies engage QuantHub? 5min to get running – align the incentives for using it / relationship.
      • What are the challenges? What are the skillsets? What do you mean that you want them to do?
      • Requirements changing by different statistical methods (along with computing power, designing algorithms vs latest research)
      • Knowing/vetting data scientists as having to do the role / job – can you mirror the actual job requirements? (try vs buy, potentially)
    • Cloud computing or hardware innovation as ‘cool’ in a world of software – highly critical, depending on certain organizations
      • Some orgs NEED the data improvement there (Kubernetes, Docker, Cloud, Spark vs Power Excel user)
    • Matt as a product strategy guy – book “Monetizing Innovation”
      • How do you determine what the market and customers want
    • Nathan’s book – “Make It Stick” on how you can improve learning methods
  • James Martin (Staffing Lead, D/S at Google) – Getting Noticed in D/S (Kaggle CareerCon 4/17/2019)
    google-cloud-platform-for-data-science-teams-4-638

    • Looking at field – ML Engineer to Quant/Statistician to Product Analyst to Data Analyst
    • Research tips: Open source projects (understanding current trends, gain experience, make connections)
      • Job descriptions (take time to research the differences, tailor approach)
      • Market research (professional networking, connect dots between companies you’d consider)
    • Resume tips: Concise (focus on telling a story on the experiences to highlight outcomes)
      • Factual (if listing skills or strengths, use examples to support them)
      • Related experience (highlight specific projects related to area you’re applying)
    • Networking tips: Professional profile (be detailed but concise about the skills you use and experiences)
      • Targeted outreach (connect after a conference, target approach for conversation)
      • Conferences (meet/greet if possible, follow up via email, LinkedIn, twitter)
  • Gidi (Gideon) Nave (@gidin), Assistant Professor of Marketing (Marketing Matters)
    • Cambridge Analytica before Cambridge – music research and how it relates to certain traits
      • Extroversion and openness were 2 big ones that they could pull from 5 traits (MUSIC)
      • MUSIC: Unpretentious, Sophistication were 2 of them
    • Could pull personality cues from 20 second, unreleased clips based on scores of 1 to 7, also
      • More agreeable people had higher scores in general
    • Personality on 5 (OCEAN – Openness, Conscientiousness, Extroverted, Agreeable, Neuroticism)
      • Questioned whether they could use music to test for the personality (as opposed to the other direction)
      • Personality is established at a young age, so can music likes on Facebook give you a personality side – as mentioned, it did ~2 better than others
  • Fintech for Startups and Incumbents (a16z 4/7/2019)
    • With GP Alex Rampell (@arampell) of CEO/cofounder of TrialPay and partner Frank Chen
      tumblr_nkfx32192w1tq3551o1_640
    • Assembling a risk pool (good and okay drivers subsidizing the bad drivers, or healthcare – same)
      • No economic model for skipping a segment – psychology for half price insurance (say, going to gym)
      • Half the number of customers – taking the ‘good’ ones, profitable ones
      • Insurance has mandatory loss ratios for different industries
    • HealthIQ – mechanism for exploitation on ‘health’ – in FinTech, it was SoFi on HENRYs
      • Positive vs adverse selection – debt settlement company ads, for instance – negotiate on your behalf to settle
      • Healthier people as living longer than non-healthy people – left them more profitable for proving being better
        • Gives them good customers (adverse selection for ‘quick, no blood test, 1 min’)
    • SoFi as stealing customers from the normal distribution – better marketing message “you’re getting ripped off, come to us”
    • Branch as investment – collect as much data as possible and look for correlations – small, mini-loans
      • Induction as pattern is a willingness to pay (credit is remembered) – went and got data from your phone
      • How many apps did you have? Did you look like you went to work? Are you gambling?
        • Counterintuitive potentially: battery goes dead leaned default, gambling app meant more likely to pay, etc…
    • Earnin – phone in pocket for 8 hours, last paycheck and RTS data confirming – will give money that you have earned but don’t have yet
      • Can tip interest or not – can you encourage people for positive community and people driving safely
      • Nurture better behavior – helping to turn customers into correctly priced customer (vs bank that doesn’t want them)
    • Vouch as company that failed but your social network had to vouch for you – Person X is okay, so you can even put up $
    • Tiffany & Co for a long time was owned by Avon lady – but its brand was massive and one of most renown jewelers
      • Could make sense for acquiring more customers, though
    • Killing Geico – take 20% of customers but only take the good ones
      • Selling negative gross widgets, for instance – probabilistic ones (and the bad ones aren’t needed)
    • Turndown traffic strategy – Chase turns down a lot of people for problems (can’t profitably do $400 loan, for instance)
      • Here’s a friend after they rejected them (but see traffic) – Chase will tell you to go to a startup for better underwriting
      • Amazon got right – HP book, for instance – had ad for B&N right next to Amazon (bought) – would make $1 on the ad click at 100% profit
        • Used this to reduce the price on their site and wasn’t sharing it
    • Rapid fire: “Always invest super early” – 9 weeks to decision vs 1 day – can’t get good deals at length
      • Best things aren’t cheap – they’re often expensive – better strategy can be plowing in late (“Can’t believe we’re putting this much $”)
      • Gating item for entrance into a space or into different models – cost of capital and distribution as often the unique thing
        • Geico could easily add additional traffic to start-ups
      • M&A strategy early? Encouraged and used Facebook – buy existential threat (surrender 1% of market cap to buy Instagram)
        • Facebook spent 7% of market cap for WhatsApp, Oculus, etc…
        • Buy the guys that failed trying – courage to build something new -> take them and put him in charge for person that was successful (at big co)
          • Trying to build this thing for ~10 years vs start-up that built something in 1 year (put this one in charge)
          • Ex: AmTrak buys Tesla – worse thing “You work for us” but you want products to push distribution and talent for understanding
        • Only difference was distribution and the possibility to do that
  • Jennifer Dulski, author of Purposeful, Head of Groups & Community at Facebook (Wharton XM)
    • Talked about their group initiatives at Facebook – communities policing themselves as well as methods to flag content
    • Mentioned example of having an employee that came up to her and asked if she had done a good job, she just wanted a bonus or something $
      • Taught her about incentives and why people do what they do / good to know the motivators
      • What drives people?
  • Anth Georgiades, CEO of Zumper (Bay Area Ventures, Wharton XM)
    z-pm

    • Purchase / hire of Padmapper in 2016 that added quite a bit of Canada business (size of California, real estate-wise)
    • How to match both sides for a marketplace – suppliers vs customers
      • Chicken and egg – focus on one, improve other and repeat
  • Word2vec (Data Skeptic 2/1/2019)
    • Produces word embeddings – autoencoders as NN for something like compression to retrieve output successfully
      • m down to n via mathematical representation (m < n)
      • Language compression for vector rep
    • Running the algorithm training on Google’s full internet, Facebook’s news article, Wikipedia, etc… to achieve similar words/spaces
      • Not super adaptive – nonsense place for words it hasn’t seen
    • Real world application – king for word2vec and subtract male – then add in female and you get queen
      • 300 dimensional space, semantics of that example
      • Bad example: training on entirety of internet results in something like doctor – male + female = nurse (gender neutral data)
    • Feature engineering for bag of words, good example for transfer learning, also (train model on text and then use parts of it on smaller area)
      • Very large corpora for NLP but can use pre-trained models of word2vec and use it in other models
  • Sean Law (@seanmylaw), D/S Research and Dev at TD Ameritrade (DataFramed #59 4/1/2019)
    sean_law_quote_card_3

    • Colleagues thinking he tends to ask lots of interesting, hard ?s – hopefully with answers
    • If he’s a hard worker, then he’ll do great – being in industry for 3 months time – has to juggle effective time spend
    • Molecular dynamics is short time scale and lots of computing power – parallel computing before and now the growth / usage of GPUs within days
    • Hypothetical example for alternative data solutions – driving to work and listening to NPR where NASA had a new dataset that was sat imagery
      • Pollution ORA dataset for air quality – area of high commodity necessity with pollution joining
    • If building ML as a binary classifier – but don’t know where the data is (do we have to collect? 3rd party API? Internal?)
      • How much effort to get it usable in the pipeline? Then, what’s the reasonable accuracy level – better than 50-50?
      • Some signal in the noise
    • Exploring chat/voice – query account balance, stock price, news articles via Alexa/Echo
      • Headless / device-agnostic option – audio to parsing of text, understanding what customer wants (NLP) and then what it means
      • Following PoC and into production
      • PoCs can miss: scalability (unless claim is to get scalability), model accuracy (not best model immediately), real-world applications (use case in mind)
    • Interpretability standpoint – regularization, L1 and linear – constraining coefficient can be very useful (background noise from video, for instance)
      • Time-series pattern-matching as non-traditional
    • Calls to action – data failures of things that didn’t work or negative results
Advertisements

Week 5 Quick Review & Week 6 Start October 14, 2015

Posted by Anthony in Daily fantasy football, DFS, Draftkings, experience, FanDuel, NFL, Stacks, Week 5, Week 6.
Tags: , , , , , , , , ,
add a comment

So, I’m going to keep this brief. I got way too ballsy this past week. Too many tournaments, too much risk, and not enough lineups to increase variation. I got slammed with wrong choices in having Demaryius / Julio in almost all line-ups. I got hurt by Lacy’s lack of pass-catching (and too much emphasis on home success for GB) & JC’s injury. TE & D/ST picks were bad, as well. Charles Clay didn’t do anything & Bennett was targeted 11 times but caught only 4 in what played out as an underperform. Jaguars got shellacked and the Giants, for whatever reason, created minimal pressure Sunday night. Well – that’s a lot of bad in spots that statistically should have been consistently high-floor.

So, the good? Rivers & Bell saved my ass Monday night so I didn’t get blanked. I wasn’t on ABrown due to Vick’s lack of rapport with him, which was good in a fade-case. Allen Robinson performed with Blake Bortles (who I was on in a few leagues). Rivers had a few good games. Brady/Edelman worked out after the 4th qtr touchdown but Gronk disappointed for how expensive he was. Dion Lewis was productive. I was not on Devonta Freeman and he continued to have a nice game. I also faded a good Doug Martin spot (suspect weather, minimal passing potentially). I figured that I would have scored with a few lineups in the high 140’s but I didn’t. Only one lineup hit 165+ and cashed on DK.


Week 6
I will try to do better – buckled down and read through more of Jonathan Bales’ series Fantasy Football for Smart People. Staying consistent with bankroll management and being diligent with a process weekly will be vital in success. To compile the statistics, I’ll likely use a trial for DailyFantasyNerd or Fantasy Labs to ensure I have the data in one spot. I could create a page for myself, but that will be fine-tuning what I want to use consistently.

So, let’s start with defenses. I read through Bill Barnwell of Grantland’s NFL Statistical Temperature. Without looking at this week’s schedule, I pared down defenses that I would be interested in playing, depending on home/away, weather, opponent, in no particular order.

Denver, Arizona, GB, NYJ (w/out a D TD so far) are the top tier. TEN, DET, NE, CIN, SEA, CAR would be the next, likely. PHI, MIN and ATL have been making plays but could be inconsistent depending on game flow.

RB’s – Forte, Foster, TJ Yeldon look good so far. Ryan Mathews has been productive. Do we continue to ride Devonta?
WR’s – Hopkins is just gobbling up targets for ppr leagues. AJ Green could be interesting – I feel like Dalton alternates between his receivers/tight ends. Just focuses in on them. EIfert was last week so Green could be this week. We cannot forget about TASER, Bryan Mears’ new statistic – anticipatory of red zone regression for touchdowns (potentially). Golden Tate, Amari Cooper, Demaryius Thomas & Keenan Allen headline it. Of those, I’d think Cooper & Golden Tate are most likely to score (Denver needs to get NEAR the red zone first and Keenan may be left out since Antonio is back). Let’s flip a coin between Hurns/Robinson again or play them both – that’s worked before. Snead on a Thursday night could be a fade position, or minimal play because of the cheap cost still. We all THINK it will be high-scoring… but will it on a Thursday with Julio in pain? I’d like to think that it won’t be and be in better position Sunday.

TE’s – I’ll have some action with Gronk & Gates. Chargers will have to throw against GB. Eifert played incredibly on Sunday (thanks to that for my main season-long league). Barnidge is apparently a) a real-life football player and b) target monster. Charles Clay is near the top of TASER as well, but with Tyrod potentially out, I may want to avoid that Bills line-up altogether.

These are my initial thoughts – I’ll see today and tomorrow what I can put together and post going forward.

Good luck to the ALDS teams today in their Game 5’s!

Happy to say my Red Wings in NHL are an impressive 3-0-0 with a +7 differential to go with the Broncos 5-0. Keep it up!

Week 4 Results – DK Fail & FanDuel Success October 7, 2015

Posted by Anthony in Daily fantasy football, DFS, Draftkings, experience, FanDuel, gym, NFL, Stacks, week 4.
Tags: , , , , , , , ,
add a comment

Yup, that’s right! FanDuel – there wasn’t a contest that I didn’t place in. Mainly due to the fact that my lineups did very well in all of the 50/50’s, DoubleUps and free-rolls. However, on draftkings, where I played a majority of tournaments, I was very close to min. cash or not in the money at all.

For week 3, I was gone/without service all weekend, so the few lineups I had in, I minimally cashed/lost. 2 weeks in a row on DK that I have lost a few $ here and there (not to mention the fun runs I took at LoL Championships as well as Wed-Fri spread of MLB games – not wise when teams were already clinched or didn’t care).

  • Fanduel results:
    5 entries played – 4 wins – 2 ticket (survivor advance, barely)
    I don’t remember where I read it, but there was an analysis done on larger 50/50’s where, assuming you’re on average better than a majority of players, you have a greater success rate in larger pools. So far, that’s proved to be correct.

    • In $350k Double-Up (with nearly 75k entries), I placed 5842 with a score of 123.74 ($5 entry for $10 winnings).
      Dalton/Green stack with LMurray, Karlos, Demaryius, Amari, MBennett, MBryant/Falcons D stack (high score)
    • FPFC Qualifier Double-Up (152 of 521) with score of 111.94 ($10 entry for $20 winnings)
      Carr/Crabtree stack w/ Karlos, JCharles, JuJones (ouch), JaJones, Barnidge (savior), Hauschka/Seattle D stack
    • Excl Football Guys Contest didn’t go that well (777 of 3119) with 110.84 ($2 entry $2 win)
      same as FPFC except swapped JamesJones/Barnidge for Amari & Martellus
    • FBGFC Qualifier Double-Up tournament (32 of 1097) with 130.06 ($10 entry $20 win & ticket later in season)
      Cam, Forte, Karlos, Julio, Amari, Moncrief, Martellus, Josh Lambo (nice 14), Falcons D (yup)
    • $250k Sun NFL Survivor Tourney (42406 of 57471) with 88.54 ($5 entry, ticket won, advanced)
      Dalton/Green stack with Randle/LMurray(ouch comb 13), demaryius, stevieJ (ugh), ebron (injured), mcmanus/Denver D stack (decent scores)
    • FantasyPros $2000 contest for saved lineups – placed 65 of 951 for $20 credit on FD (devonta and andy dalton)
  • So I found that I like stacking kickers with the defenses so far. Relatively positive correlation and it has worked with a select few kickers/defense – good field position relations, typically.
  • Draftkings results:
    • As I mentioned, this didn’t go as well. Likely because I played in too many tournaments.
    • In the double-ups I played, I cashed in all 3. So for those keeping count, I was 6 for 6.
    • Same lineup for 3 double-ups score of 136.74 (62 of 217), (99 of 340), (423 of 1135) for 3x $1 entries, $2 each of winnings
      Carr/Crabtree stack, Gore, Karlos, Cobb, Deandre (2nd half!!), JuJones, Bennett (yup), Broncos D
    • I played all-day Sunday teams and had a ~112 score, while I played early Sunday match lineups and had those go for 147 and 144 for winners.
    • 2x $.25 Arcade & $2k First Down ($1 entry, $2 win) earlies for $.50 winnings on $.25 entries (144.66 score)
      Tyrod/Karlos/Clay/Bills D stack w/ Forte, AJ Green, T.Y., Deandre (savior), AllenRob
    • Had a ThursSun Line-up that did terribly (103.96) for $3 entry and 0 winnings
      Flacco/SmithSr/MaxxWill/Ravens stack, Leveon, Ivory (rb successes), ABrown, Rishard, Jarvis (wr duds)
    • Played $0.25 Quarter Arcade (Sun only) for $1 entry, $2 winnings (147.56 score)
      Rodgers, Devonta (great), Hyde, Keenan, Amari, JuJones, Fleener (good), Karlos, Raiders D
    • Daily Dollar ($50k) $1 entry for $2 win (136.74 score for 9220 of 52596)
      Carr/Crabtree stack, Gore, Karlos, Cobb, Deandre, JuJones, Bennett, Broncos D
    • 130.16 score didn’t work for $1 $150k First Down or $0.25 Arcade
      Rodgers/Packers, Devonta, Gore, Evans, JuJones, Moncrief, Olsen, Karlos
    • Another Flacco / Ravens stack for Thurs-Sun produced the 112, which didn’t go well in any of the 3 entries (-$5).
  • So we’ll see. Seems consistent with stacks where I’ll go 2 for 4 and hope that the stacks hit bigger. This week, they didn’t do as much because of the chalk not performing as well. The few good calls at TE and Devonta saved those stacks to at least cash.

We’ll see how the next week goes – I may end up trying more double-ups to at least build bankroll.

To next week!

%d bloggers like this: